| Literature DB >> 27749924 |
Tristan Barbeyron1, Loraine Brillet-Guéguen2, Wilfrid Carré2, Cathelène Carrière1, Christophe Caron2, Mirjam Czjzek1, Mark Hoebeke2, Gurvan Michel1.
Abstract
Sulfatases cleave sulfate groups from various molecules and constitute a biologically and industrially important group of enzymes. However, the number of sulfatases whose substrate has been characterized is limited in comparison to the huge diversity of sulfated compounds, yielding functional annotations of sulfatases particularly prone to flaws and misinterpretations. In the context of the explosion of genomic data, a classification system allowing a better prediction of substrate specificity and for setting the limit of functional annotations is urgently needed for sulfatases. Here, after an overview on the diversity of sulfated compounds and on the known sulfatases, we propose a classification database, SulfAtlas (http://abims.sb-roscoff.fr/sulfatlas/), based on sequence homology and composed of four families of sulfatases. The formylglycine-dependent sulfatases, which constitute the largest family, are also divided by phylogenetic approach into 73 subfamilies, each subfamily corresponding to either a known specificity or to an uncharacterized substrate. SulfAtlas summarizes information about the different families of sulfatases. Within a family a web page displays the list of its subfamilies (when they exist) and the list of EC numbers. The family or subfamily page shows some descriptors and a table with all the UniProt accession numbers linked to the databases UniProt, ExplorEnz, and PDB.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27749924 PMCID: PMC5066984 DOI: 10.1371/journal.pone.0164846
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sulfatases of known substrate specificity.
The proteins have been sorted according to their EC numbers.
| Protein name / Family | Gene name | Organism | EC number | UniProt code | PDB code | References |
|---|---|---|---|---|---|---|
| Arylsulfatase / S1_4 | 3.1.6.1 | P20713 | - | [ | ||
| Arylsulfatase / S1_4 | 3.1.6.1 | P51691 | 1hdh | [ | ||
| Arylsulfatase (tyrosine sulfatase) / S1_6 | 3.1.6.1 | Q10723 | - | [ | ||
| Arylsulfatase / S4 | 3.1.6.1 | P28607 | - | [ | ||
| Steryl-sulfatase / S1_3 | STS (ARSC) | 3.1.6.2 | P08842 | 1p49 | [ | |
| Steryl-sulfatase / S1_3 | STS (ARSC) | 3.1.6.2 | P15589 | [ | ||
| Steryl-sulfatase / S1_3 | STS (ARSC) | 3.1.6.2 | P50427 | [ | ||
| N-acetylgalactosamine -6-sulfatase / S1_5 | GALNS | 3.1.6.4 | P34059 | 4fdi | [ | |
| N-acetylgalactosamine -6-sulfatase / S1_5 | GALNS | 3.1.6.4 | Q571E4 | [ | ||
| N-acetylgalactosamine -6-sulfatase / S1_5 | GALNS | 3.1.6.4 | Q8WNQ7 | [ | ||
| Choline-sulfatase / S1_12 | 3.1.6.6 | O69787 | - | [ | ||
| Cerebroside sulfatase / S1_1 | ARSA | 3.1.6.8 | P15289 | 1auk | [ | |
| Cerebroside sulfatase / S1_1 | ARSA | 3.1.6.8 | P50428 | [ | ||
| N-acetylgalactosamine -4-sulfatase / S1_2 | ARSB | 3.1.6.12 | P15848 | 1fsu | [ | |
| N-acetylgalactosamine -4-sulfatase / S1_2 | ARSB | 3.1.6.12 | P33727 | [ | ||
| N-acetylgalactosamine -4-sulfatase / S1_2 | ARSB | 3.1.6.12 | P50430 | [ | ||
| N-acetylgalactosamine -4-sulfatase / S1_2 | ARSB | 3.1.6.12 | P50429 | [ | ||
| Iduronate 2-sulfatase / S1_7 | IDS | 3.1.6.13 | P22304 | - | [ | |
| Iduronate 2-sulfatase / S1_7 | IDS | 3.1.6.13 | Q08890 | - | [ | |
| Heparin/heparan sulfate 2-O-sulfatase / S1_9 | 3.1.6.13 | C6Y1N2 | - | [ | ||
| N-acetylglucosamine-6-sulfatase / S1_6 | GNS | 3.1.6.14 | P15586 | - | [ | |
| N-acetylglucosamine-6-sulfatase / S1_6 | GNS | 3.1.6.14 | P50426 | - | [ | |
| Mucin-desulfating sulfatase / S1_11 | 3.1.6.14 | Q9L5W0 | - | [ | ||
| Extracellular sulfatase 1 (N-acetylglucosamine-6-sulfatase) / S1_6 | SULF1 | 3.1.6.14 | Q90XB6 | - | [ | |
| Extracellular sulfatase 2 (N-acetylglucosamine-6-sulfatase) / S1_6 | SULF1 | 3.1.6.14 | Q8IWU6 | - | [ | |
| Extracellular sulfatase 2 (N-acetylglucosamine-6-sulfatase) / S1_6 | SULF1 | 3.1.6.14 | Q8K007 | - | [ | |
| Extracellular sulfatase 2 (N-acetylglucosamine-6-sulfatase) / S1_6 | SULF2 | 3.1.6.14 | Q8IWU5 | - | [ | |
| Extracellular sulfatase 2 (N-acetylglucosamine-6-sulfatase) / S1_6 | SULF2 | 3.1.6.14 | Q8CFG0 | - | [ | |
| Heparin/heparan sulfate 6-O-sulfatase / S1_11 | Phep_2827 | 3.1.6.14 | C6Y1N4 | - | [ | |
| Sec-alkysulfatase / S3 | 3.1.6.19 | F8KAY7 | 2yhe | [ | ||
| N-sulfoglucosamine sulfohydrolase / S1_8 | SGSH | 3.10.1.1 | P51688 | 4miv | [ | |
| Heparin/heparan sulfate N-sulfamidase / S1_8 | 3.10.1.1 | C6Y1N3 | - | [ | ||
| Alkysulfatase / S2 | 1.14.11.- | Q9WWU5 | 1oih | [ | ||
| Alpha-ketoglutarate-dependent sulfate ester dioxygenase / S2 | 1.14.11.- | P9WKZ1 | 4cvy | [ | ||
| Endo-4S-kappa-carrageenan sulfatase / S1_7 | Patl_0891 | 3.1.6.- | Q15XH1 | [ | ||
| Glucosinolate sulfatase / S1_10 | - | 3.1.6.- | Q8MM72 | - | [ | |
| Endo-4S-iota-carrageenan sulfatase / S1_19 | Patl_0889 | 3.1.6.- | Q15XH3 | [ | ||
| Endo-4S-kappa-carrageenan sulfatase / S1_19 | Patl_0895 | 3.1.6.- | Q15XG7 | [ | ||
| Alkysulfatase / S3 | 3.1.6.- | Q9I5I9 | 2cfu | [ | ||
| Alkysulfatase / S3 | 3.1.6.- | F2WP51 | 4nur | unpublished | ||
| phosphonate monoester hydrolase / phosphodiesterase / S1_0 | 3.1.-.- | Q45087 | 2w8s | [ | ||
| phosphonate monoester hydrolase / phosphodiesterase / S1_0 | 3.1.-.- | Q1M964 | 2vqr | [ |
Identity scores for pairwise sequence comparisons of the formyglycine-dependent sulfatases of known substrate specificity.
For each entry, the bold numbers correspond to the identity score for full length sequences, while the numbers in italics correspond to the identity score after editing of the multiple sequence alignment.
| ARSA | ARSB | ARSC | AtsAp | AtsAk | GALNS | GNS | SULF2 | SULF1 | IDSh | SGSH | ID2Sp | GlcS | MdSA | betC | |
| ARSA | 100 | ||||||||||||||
| ARSB | 100 | ||||||||||||||
| ARSC | 100 | ||||||||||||||
| AtsAp | 100 | ||||||||||||||
| AtsAk | 100 | ||||||||||||||
| GALNS | 100 | ||||||||||||||
| GNS | 100 | ||||||||||||||
| SULF2 | 100 | ||||||||||||||
| SULF1 | 100 | ||||||||||||||
| IDSm | 100 | ||||||||||||||
| SGSH | 100 | ||||||||||||||
| ID2Sp | 100 | ||||||||||||||
| GlcS | 100 | ||||||||||||||
| MdsA | 100 | ||||||||||||||
| BetC | 100 |
Fig 1Fold and active site of representatives from the different families of sulfatases.
S1 family: Fold (A) and active site (B) of the arylsulfatase AtsA from Pseudomonas aeruginosa PAO1 (PDB code: 1HDH) [35]. S2 family: Fold (C) and active site (D) of the alkylsulfatase AtsK from Pseudomonas putida S-313 (PDB code: 1OIK) [23]; S3 family: Fold (E) and active site (F) of the alkylsulfatase SdsA1 from Pseudomonas aeruginosa PAO1 (PDB code: 2CFU) [65]. The folds are shown in cartoon representation. The amino acids and ligands of the active sites are shown in sticks. The cations are shown as spheres. The figures were made using PyMoL (Version 1.8 Schrödinger, LLC).
Fig 2Favored catalytic mechanism of the S1 family sulfatases.
The numbering corresponds to the arylsulfatase AtsA from Pseudomonas aeruginosa PAO1 [35]. Upon substrate binding, the formyglycine is activated for nucleophilic attack on sulfur by Asp317. The sulfoenzyme intermediate is formed, and desulfation most likely occurs by elimination from the remaining fGly-diol hydroxyl (E2), catalyzed by His115. This figure was adapted from the following references [35, 38] and prepared with Accelrys Draw 4.2.
Fig 3Catalytic mechanism of the S2 family sulfatases.
The numbering corresponds to the alkylsulfatase AtsK from Pseudomonas putida S-313. First iron and the cosubstrate alpha-ketoglutarate (KG) coordinate to the enzyme. Second, the alkyl sulfate binds to the active site, displacing a water molecule from the iron center and liberating an unsaturated iron atom. Subsequently a dioxygen molecule binds the iron cation. One oxygen atom of the dioxygen is transferred to KG, yielding succinate and carbon dioxide as products. The iron is thereby oxidized, and a ferryl Fe(IV) = O species is formed, which then hydroxylates the alkyl sulfate via a radical intermediate. Finaly sulfate ion and succinate are released and two water molecules complete the iron coordination sphere. This figure was adapted from [23, 119, 120]
Fig 4Logos of conserved consensus sequences identified in the global alignment of FGly-sulfatases.
Logos of conserved consensus sequences were identified from 4058 aligned FGly-sulfatases. The logo sequence of the catalytic site that corresponds to the PROSITE signature PS00523, is shown in A. The logo sequence of PROSITE signature PS00149 is shown in B. The two logo sequences of calcium binding are shown in C and D. A logo sequence from a conserved supplementary consensus sequences is shown in E. The numbers below the logo sequences indicate, at the first position, the corresponding position in reference sequence (AtsA P51691). The corresponding consensus sequences in multi-alignment are shown below the logo sequences. The percentages in subscript are the percentages of sequences, where the amino acid is conserved in alignment. Catalytic amino acids and residues involved in calcium ion binding are in bold.
Fig 5Phylogenetic tree of Fe(II) alpha-ketoglutarate-dependent dioxygenase superfamily.
The tree was obtained by maximum likelihood with RAxML using the substitution matrix WAG from 211 positions of an alignment of 469 sequences belonging to the alpha-ketoglutarate-dependent dioxygenase superfamily, closely related to TauD, TfdA and AtsK families. The clades in colors contain the characterized sequences TauD (taurine dioxygenase, P37610), TfdA (2,4-dichlorophenoxyacetate dioxygenase, P10088) and AtsK (alkysulfatase, Q9WWU5). The black clades and the isolated sequences (not supported by high bootstrap values) contain no biochemically-characterized enzymes. The families S2 of the sulfatases is shown in red. All the resolved tridimensional structures are indicated. Only bootstrap values above 60% are shown.
Fig 6Logos of conserved consensus sequences identified in the global alignment of alkylsulfodioxygenases (family S2) and alkylsulfohydrolases (family S3).
Logos sequences identified from aligned 111 alkylsulfodioxygenases (A) and 354 alkylsulfohydrolases (B). The numbers below the logo at the first position indicate the corresponding position in reference sequences (Atsk Q9WWU5 in A and SdsA1 Q9I5I9 in B). The corresponding consensus sequences in multi-alignments are shown below the logo sequences. The percentages in subscript are the percentages of sequences where the amino acid is conserved in alignments. Amino acids involved in sulfate binding are in bold.
Fig 7Phylogenetic tree of zinc metallo-β-lactamase superfamily.
The tree was obtained by maximum likelihood with RAxML using the substitution matrix WAG from 96 positions of an alignment of 288 sequences belonging to various families of the superfamily of zinc metallo-beta-lactamase. The coloured clades contain the sequences with known activities. The black clades and the isolated sequences (not supported by high bootstrap values) contain no biochemically-characterized enzymes. The families S3 and S4 of sulfatases are shown in red. All the resolved three-dimensional structures are indicated. The numbers of the family are references to Daiyasu's groups [79]. Only bootstrap values above 50% are shown.
Fig 8SulfAtlas website.
Example of the “sulfatase family S1” page. Within each family, a text presents the current knowledges of concerned sulfatases and for the S1 family, the list of subfamily is shown and also the distribution of known EC numbers within them.