| Literature DB >> 23584835 |
Abstract
Genome sequencing projects are generating enormous amounts of biological data that require analysis, which in turn identifies genes and proteins that require characterization. Enzymes that act on proteins are especially difficult to characterize because of the time required to distinguish one from another. This is particularly true of peptidases, the enzymes that activate, inactivate and degrade proteins. This article aims to identify clusters of sequences each of which represents the species variants of a single putative peptidase that is widely distributed and is thus merits biochemical characterization. The MEROPS database maintains large collections of sequences, references, substrate cleavage positions and inhibitor interactions of peptidases and their homologues. MEROPS also maintains a hierarchical classification of peptidase homologues, in which sequences are clustered as species variants of a single peptidase; homologous sequences are assembled into a family; and families are clustered into a clan. For each family, an alignment and a phylogenetic tree are generated. By assigning an identifier to a peptidase that has been biochemically characterized from a particular species (called a holotype), the identifier can be automatically extended to sequences from other species that cluster with the holotype. This permits transference of annotation from the holotype to other members of the cluster. By extending this concept to all peptidase homologues (including those of unknown function that have not been characterized) from model organisms representing all the major divisions of cellular life, clusters of sequences representing putative peptidases can also be identified. The 42 most widely distributed of these putative peptidases have been identified and discussed here and are prioritized as ideal candidates for biochemical characterization. Database URL: http://merops.sanger.ac.uk.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23584835 PMCID: PMC3625958 DOI: 10.1093/database/bat022
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Increase in the number of peptidase homologue sequences, 1998–2012. The graph shows the cumulative number of peptidase homologue sequences added to the MEROPS database per year since 1998, when submission dates were first recorded. Also shown are the number of these sequences that have been assigned to a MEROPS identifier and the number of MEROPS identifiers.
Peptidases from model organisms
| Organism | Total | Holotypes | Holotypes with references | Holotypes with known substrate cleavage sites | Holotypes with inhibitor interactions | Uncharacterized holotypes |
|---|---|---|---|---|---|---|
| 597 | 462 | 568 | 261 | 174 | 96 | |
| 635 | 221 | 151 | 60 | 30 | 100 | |
| 467 | 407 | 92 | 27 | 17 | 330 | |
| 367 | 324 | 65 | 11 | 3 | 263 | |
| 576 | 520 | 127 | 20 | 14 | 398 | |
| 112 | 97 | 83 | 59 | 22 | 19 | |
| 115 | 55 | 17 | 3 | 1 | 39 | |
| 174 | 99 | 9 | 3 | 1 | 90 | |
| 106 | 94 | 91 | 56 | 24 | 33 | |
| 205 | 94 | 48 | 23 | 12 | 50 | |
| 70 | 27 | 9 | 7 | 3 | 17 |
There may still be some B. subtilis holotypes to identify.
For the purposes of this table, peptidases from retrotransposons (families A2 and A11) are excluded. An uncharacterized holotype is one with no references and no known cleavages or inhibitor interactions. In addition to the holotypes established for peptidases from E. coli strain K12 substrain MG1655, some 20 peptidases not found in this strain but in other strains of E. coli have been characterized, and holotypes have been set-up for these, for example, colicin V processing peptidase (C39.005), which is encoded on a plasmid.
Peptidases common to model organisms
| Source organism for holotype | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | Mouse | ||||||||||
| Human | 597 | 462 | 51 | 31 | 28 | 15 | 19 | 31 | 1 | 1 | 1 |
| Mouse | 635 | 49 | 28 | 25 | 15 | 19 | 31 | 1 | 1 | 2 | |
| 467 | 25 | 19 | 11 | 15 | 19 | 1 | 1 | 1 | |||
| 367 | 14 | 13 | 15 | 14 | 1 | 2 | 0 | ||||
| 576 | 17 | 25 | 31 | 2 | 4 | 1 | |||||
| 112 | 27 | 12 | 0 | 1 | 0 | ||||||
| 115 | 14 | 1 | 1 | 1 | |||||||
| 174 | 3 | 2 | 1 | ||||||||
| 106 | 14 | 3 | |||||||||
| 205 | 7 | ||||||||||
| 70 | |||||||||||
This table shows the number of times the same MEROPS identifier has been assigned to sequences from different model organisms. Thus, of the 597 human peptidases, 462 sequences are assumed or known to represent peptidases with similar, if not identical, characteristics in mouse, but only 51 in Drosophila.
Figure 2Use of a phylogenetic tree to assign MEROPS identifiers. Part of the tree for family M3 subfamily A is shown. The tips correspond to individual sequences. An arrow indicates the tip corresponding to the sequence that is the holotype for a particular MEROPS identifier. Tips assigned to the same MEROPS identifier are bracketed together, and the MEROPS identifier is shown. Sequences that are not included within a bracket are unassigned. Tips 63–70 are derived from a node ancestral to M03.003 and M03.009; hence, they cannot be assigned to either identifier; similar situations apply to the unassigned tips 135–157, 158–166 and 184–246. The tree shows two identifiers for putative peptidases. M03.A03 was initially assigned to the At1g67690 gene product from A. thaliana and has been extended to include sequences from nine other plants. On the other hand, M03.A07, which was originally assigned to the DDB_G0292362 gene product from the slime mould D. discoideum, but cannot be extended to sequences from other species because no others are derived from the same node on the tree.
New holotypes for putative peptidases with the widest organism distribution
| MEROPS identifier | Peptidase name | Number of species | Phyla |
|---|---|---|---|
| C26.A05 | γ-Glutamyl peptidase 1 ( | 78 | Bacteria (Bacteroidetes, Proteobacteria, Xenobacteria) Archaea (Euryarchaeota) Fungi (Ascomycota, Basidiomycota, Chytridiomycota) Plantae (Tracheophyta) Animalia (Annelida) |
| C26.A28 | SPBPB2B2.05 ( | 297 | Bacteria (Acidobacteria, Bacteroidetes, Chlamydiae, Chloroflexi, Chrysiogenetes, Cyanobacteria, Firmicutes, Fusobacteria, Gemmatimonadetes, Nitrospirae, Planctomycetes, Proteobacteria, Spirochaetes, Thermotogae, Xenobacteria) Fungi (Ascomycota) Plantae (Tracheophyta) |
| C26.A31 | GMP synthase ( | 72 | Bacteria (Chlamydiae) Archaea (Crenarchaeota, Euryarchaeota) |
| C26.A32 | Imidazoleglycerol-phosphate synthase ( | 176 | Bacteria (Acidobacteria, Bacteroidetes, Chloroflexi, Dictyoglomi, Fibrobacteres, Firmicutes, Fusobacteria, Proteobacteria, Synergistetes, Thermotogae, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota, Korarchaeota) Fungi (Ascomycota) Plantae (Tracheophyta) Animalia (Chordata) |
| C40.A01 | NlpC protein ( | 60 | Bacteria (Proteobacteria) Animalia (Porifera) |
| C44.A08 | Glutamine-fructose-6-phosphate transaminase precursor ( | 74 | Bacteria (Bacteroidetes, Chlorobi, Chloroflexi, Cyanobacteria, Firmicutes, Proteobacteria, Verrucomicrobia, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota) Protozoa (Alveolata, Apicomplexa, Microspora, Sarcomastigophora) Plantae (Heterokontophyta, Ochrophyta) Animalia (Arthropoda, Chordata) |
| C56.A06 | DDB_G0276405 ( | 350 | Bacteria (Acidobacteria, Bacteroidetes, Chloroflexi, Cyanobacteria, Deferribacteres, Firmicutes, Ignavibacteria, Lentisphaerae, Planctomycetes, Proteobacteria, Spirochaetes, Thermodesulfobacteria, Verrucomicrobia, Xenobacteria) Archaea (Euryarchaeota) Protozoa (Sarcomastigophora) Fungi (Ascomycota) Plantae (Chlorophyta) Animalia (Arthropoda, Nematoda) |
| C82.A01 | YafK protein ( | 56 | Bacteria (Bacteroidetes, Proteobacteria) |
| C82.A07 | Murein transglycosylase ( | 74 | Bacteria (Cyanobacteria, Firmicutes, Proteobacteria) |
| M03.A08 | BSSC8_09230 protein ( | 234 | Bacteria (Chlamydiae, Chloroflexi, Cyanobacteria, Firmicutes, Planctomycetes, Proteobacteria, Spirochaetes, Thermotogae, Xenobacteria) Plantae (Tracheophyta) |
| M13.A32 | T25B6.2 protein ( | 140 | Bacteria (Bacteroidetes, Firmicutes, Proteobacteria) Fungi (Ascomycota, Basidiomycota) Plantae (Chlorophyta, Rhodophyta, Tracheophyta) Animalia (Arthropoda, Chordata, Echinodermata, Hemichordata, Nematoda) |
| M15.A04 | YokZ protein ( | 274 | Bacteria (Bacteroidetes, Chloroflexi, Cyanobacteria, Firmicutes, Spirochaetes, Xenobacteria) |
| M16.A04 | Insulysin homologue ( | 73 | Fungi (Ascomycota, Basidiomycota) |
| M16.A05 | PqqL protein ( | 103 | Bacteria (Bacteroidetes, Chlorobi, Deferribacteres, Fusobacteria, Gemmatimonadetes, Proteobacteria, Spirochaetes, Xenobacteria) |
| M20.A08 | YgeY protein ( | 61 | Bacteria (Bacteroidetes, Caldiserica, Chloroflexi, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes, Synergistetes, Thermotogae) |
| M20.A18 | DDB_G0279291 protein ( | 72 | Bacteria (Firmicutes, Proteobacteria) Protozoa (Alveolata, Parabasalidea, Sarcomastigophora) Plantae (Tracheophyta) |
| M20.A21 | YodQ protein ( | 81 | Bacteria (Chloroflexi, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes) Archaea (Euryarchaeota) |
| M20.A23 | Bsubs1_010100013116 protein ( | 52 | Bacteria (Firmicutes, Fusobacteria, Planctomycetes, Proteobacteria, Synergistetes) |
| M20.A27 | YkuR protein ( | 82 | Bacteria (Firmicutes, Fusobacteria, Thermotogae) |
| M24.A09 | YFR006W protein ( | 57 | Fungi (Ascomycota, Basidiomycota) Plantae (Chlorophyta) |
| M24.A11 | SPBC4F6.19c protein ( | 77 | Bacteria (Acidobacteria, Bacteroidetes, Proteobacteria) Fungi (Ascomycota, Basidiomycota) Plantae (Tracheophyta) |
| M48.A02 | At3g27110 ( | 50 | Bacteria (Cyanobacteria, Firmicutes, Proteobacteria, Spirochaetes) Archaea (Euryarchaeota) Plantae (Tracheophyta) |
| M50.A04 | PF0392 protein ( | 79 | Bacteria (Acidobacteria, Bacteroidetes, Chlorobi, Chloroflexi, Cyanobacteria, Nitrospirae, Planctomycetes, Proteobacteria) Archaea (Crenarchaeota, Euryarchaeota) |
| M50.A05 | YwhC protein ( | 203 | Bacteria (Acidobacteria, Aquificae, Chloroflexi, Chrysiogenetes, Deferribacteres, Dictyoglomi, Firmicutes, Fusobacteria, Nitrospirae, Proteobacteria, Spirochaetes, Synergistetes, Thermodesulfobacteria, Verrucomicrobia, Xenobacteria) Archaea (Nanoarchaeota) |
| M50.A07 | PF0457 protein ( | 76 | Bacteria (Bacteroidetes, Chloroflexi, Cyanobacteria, Firmicutes, Gemmatimonadetes, Proteobacteria, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota) Plantae (Chlorophyta) |
| M79.A04 | At3g26085 ( | 58 | Bacteria (Chloroflexi, Cyanobacteria, Firmicutes, Proteobacteria, Xenobacteria) Archaea (Euryarchaeota) Plantae (Tracheophyta) |
| M79.A09 | YdiL protein ( | 140 | Bacteria (Bacteroidetes, Chlamydiae, Chloroflexi, Cyanobacteria, Deferribacteres, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes, Verrucomicrobia) Archaea (Crenarchaeota, Euryarchaeota) Protozoa (Sarcomastigophora) |
| N06.A01 | FlhB protein ( | 450 | Bacteria (Acidobacteria, Aquificae, Chloroflexi, Deferribacteres, Firmicutes, Proteobacteria, Spirochaetes, Synergistetes, Thermotogae) |
| S01.A08 | At5g27660 ( | 58 | Bacteria (Acidobacteria, Bacteroidetes, Caldiserica, Chlamydiae, Chloroflexi, Firmicutes, Fusobacteria, Ignavibacteria, Planctomycetes, Proteobacteria, Spirochaetes, Thermotogae, Xenobacteria) Archaea (Crenarchaeota) Plantae (Heterokontophyta, Tracheophyta) Animalia (Chordata) |
| S09.A43 | YpfH protein ( | 101 | Bacteria (Acidobacteria, Chlamydiae, Chlorobi, Chloroflexi, Cyanobacteria, Firmicutes, Proteobacteria) Fungi (Ascomycota) |
| S09.A77 | dpf-6 protein ( | 334 | Bacteria (Acidobacteria, Bacteroidetes, Chlamydiae, Chloroflexi, Cyanobacteria, Firmicutes, Gemmatimonadetes, Nitrospirae, Planctomycetes, Proteobacteria, Synergistetes, Thermotogae, Verrucomicrobia, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota) Protozoa (Sarcomastigophora) Fungi (Ascomycota) Plantae (Ochrophyta, Tracheophyta) Animalia (Chordata, Nematoda) |
| S09.B04 | BSU23640 protein ( | 56 | Bacteria (Acidobacteria, Bacteroidetes, Chloroflexi, Firmicutes, Planctomycetes, Proteobacteria, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota) |
| S10.A47 | At2g22960 ( | 122 | Protozoa (Alveolata, Apicomplexa, Sarcomastigophora) Fungi (Ascomycota, Basidiomycota) Plantae (Chlorophyta, Ochrophyta, Oomycota, Rhodophyta, Streptophyta, Tracheophyta) Animalia (Arthropoda, Chordata, Echinodermata, Hemichordata, Nematoda) |
| S12.A03 | EcHS_A2566 protein ( | 180 | Bacteria (Bacteroidetes, Chlorobi, Chloroflexi, Cyanobacteria, Firmicutes, Fusobacteria, Nitrospirae, Planctomycetes, Proteobacteria, Spirochaetes, Thermotogae, Xenobacteria) Archaea (Euryarchaeota) Fungi (Ascomycota) |
| S12.A23 | GYO_0385 protein ( | 51 | Bacteria (Bacteroidetes, Firmicutes, Proteobacteria, Spirochaetes, Xenobacteria) Archaea (Euryarchaeota) |
| S16.A10 | YcbZ ( | 110 | Bacteria (Firmicutes, Proteobacteria, Spirochaetes) Animalia (Chordata) |
| S16.A12 | PF1438 ( | 270 | Bacteria (Firmicutes, Proteobacteria) Archaea (Crenarchaeota, Euryarchaeota, Korarchaeota) Protozoa (Apicomplexa) |
| S49.A08 | BSn5_05605 protein ( | 189 | Bacteria (Chrysiogenetes, Cyanobacteria, Deferribacteres, Firmicutes, Proteobacteria, Spirochaetes, Thermodesulfobacteria, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota) |
| S49.A09 | PF0240protein ( | 67 | Bacteria (Aquificae, Bacteroidetes, Chloroflexi, Cyanobacteria, Deferribacteres, Dictyoglomi, Firmicutes, Nitrospirae, Proteobacteria, Synergistetes, Thermodesulfobacteria, Thermotogae, Xenobacteria) Archaea (Crenarchaeota, Euryarchaeota) |
| S54.A18 | YdcA protein ( | 54 | Bacteria (Firmicutes, Proteobacteria) |
| T03.A09 | GYO_3966 protein ( | 761 | Bacteria (Acidobacteria, Bacteroidetes, Chloroflexi, Cyanobacteria, Firmicutes, Fusobacteria, Lentisphaerae, Planctomycetes, Proteobacteria, Xenobacteria) Archaea (Euryarchaeota) Protozoa (Sarcomastigophora) Fungi (Ascomycota, Basidiomycota, Oomycota) Plantae (Chlorophyta, Heterokontophyta, Ochrophyta, Tracheophyta) Animalia (Annelida, Arthropoda, Chordata, Cnidaria, Placozoa, Porifera) |
| U32.A01 | YhbV protein ( | 131 | Bacteria (Proteobacteria) |
Columns are the MEROPS identifier (linked to the peptidase summary page in the MEROPS database); the name used in the MEROPS database, which is often derived from the gene name; the number of species containing a sequence to which the identifier has been assigned; and the phyla that includes these species, grouped by kingdom. When one of these putative peptidases is characterized, the MEROPS identifier will be replaced with an identifier that follows the normal naming convention used in the MEROPS database. The obsolete identifier will not be re-used, and so that the links below will remain useful the user will be automatically redirected to the summary page for the replacement identifier.
Totals of identifiers for characterized and uncharacterized peptidases
| Family | Type example | Characterized peptidases | Putative peptidases |
|---|---|---|---|
| A1 | Pepsin | 84 | 92 |
| A2 | Retropepsin | 32 | 0 |
| C1 | Papain | 148 | 59 |
| C2 | Calpain | 25 | 10 |
| C19 | Ubiquitin-specific peptidase 14 | 94 | 66 |
| C26 | γ-Glutamyl hydrolase | 1 | 20 |
| C48 | Ulp1 peptidase | 32 | 19 |
| M1 | Aminopeptidase N | 31 | 31 |
| M12 | Astacin | 180 | 51 |
| M13 | Neprilysin | 17 | 32 |
| M14 | Carboxypeptidase A1 | 34 | 34 |
| M16 | Pitrilysin | 17 | 20 |
| M20 | Glutamate carboxypeptidase | 20 | 21 |
| M41 | FtsH peptidase | 22 | 14 |
| S1 | Chymotrypsin | 462 | 180 |
| S8 | Subtilisin | 147 | 58 |
| S9 | Prolyl oligopeptidase | 45 | 100 |
| S10 | Carboxypeptidase Y | 17 | 69 |
| S12 | 10 | 21 | |
| S26 | Signal peptidase I | 27 | 10 |
| S28 | Lysosomal Pro-Xaa carboxypeptidase | 5 | 26 |
| S33 | Prolyl aminopeptidase | 18 | 100 |
| S54 | Rhomboid-1 | 28 | 18 |
| S63 | EGF-like module containing mucin-like hormone receptor-like 2 | 35 | 2 |
| T3 | γ-Glutamyltransferase 1 | 21 | 9 |
The total number of identifiers for characterized and uncharacterized peptidases is shown for all families where there are ≥20 examples in either category. The name of the type example peptidase is given for each family.