| Literature DB >> 20157488 |
Abstract
Peptidases are enzymes that hydrolyse peptide bonds in proteins and peptides. Peptidases are important in pathological conditions such as Alzheimer's disease, tumour and parasite invasion, and for processing viral polyproteins. The MEROPS database is an Internet resource containing information on peptidases, their substrates and inhibitors. The database now includes details of cleavage positions in substrates, both physiological and non-physiological, natural and synthetic. There are 39 118 cleavages in the collection; including 34 606 from a total of 10 513 different proteins and 2677 cleavages in synthetic substrates. The number of cleavages designated as 'physiological' is 13 307. The data are derived from 6095 publications. At least one substrate cleavage is known for 45% of the 2415 different peptidases recognized in the MEROPS database. The website now has three new displays: two showing peptidase specificity as a logo and a frequency matrix, the third showing a dynamically generated alignment between each protein substrate and its most closely related homologues. Many of the proteins described in the literature as peptidase substrates have been studied only in vitro. On the assumption that a physiologically relevant cleavage site would be conserved between species, the conservation of every site in terms of peptidase preference has been examined and a number have been identified that are not conserved. There are a number of cogent reasons why a site might not be conserved. Each poorly conserved site has been examined and a reason postulated. Some sites are identified that are very poorly conserved where cleavage is more likely to be fortuitous than of physiological relevance. This data-set is freely available via the Internet and is a useful training set for algorithms to predict substrates for peptidases and cleavage positions within those substrates. The data may also be useful for the design of inhibitors and for engineering novel specificities into peptidases.Database URL:http://merops.sanger.ac.uk.Entities:
Year: 2009 PMID: 20157488 PMCID: PMC2790309 DOI: 10.1093/database/bap015
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Preference for amino acids in substrate binding sites. The bar chart shows the number of peptidases showing a preference for one or two amino acids for each substrate binding site S4–S4′. Of the 312 peptidase with 10 or more known substrate cleavages, 202 show a preference and are included in the figure. A count is made whenever an amino acid occurs in one binding pocket in 40% or more of the substrates. There are 15 peptidases that have a preference for two amino acids in a binding pocket: walleye dermal sarcoma virus retropepsin (A02.063, Asn or Gln in S2), sapovirus 3C-like peptidase (C24.003, Glu or Gln in S1), SARS coronavirus picornain 3C-like peptidase (C30.005, Gly or Gln in S1), peptidyl-peptidase Acer (M02.002, Gly or Pro in S1), vimelysin (M04.010, Phe or Leu in S1), carboxypeptidase M (M14.006, Arg or Lys in S1′), carboxypeptidase U (M14.009, Arg or Lys in S1′), dactylysin (M9G.026, Leu or Phe in S1′), chymase (S01.140, Phe or Tyr in S1), tryptase alpha (S01.143, Lys or Arg in S1), trypsin 1 (S01.151, Lys or Arg in S1), plasmin (S01.233, Lys or Arg in S1), flavivirin (S07.001, Lys or Arg in S2), dipeptidyl aminopeptidase A (S09.005, Ala or Pro in S1) and kumamolisin (S53, 004, Glu or Gly in S3). Many peptidases show a preference in more than one binding pocket. There are 13 peptidases with a preference for all eight binding pockets, another 13 with a preference in seven, five peptidases in six, three in five, eight in four, 24 in three, 47 in two and 89 in only one.
Peptidase preference by amino acid type
| Amino acid type | S4 | S3 | S2 | S1 | S1′ | S2′ | S3′ | S4′ |
|---|---|---|---|---|---|---|---|---|
| Acidic | 5 | 7 | 5 | 24 | 5 | 4 | 2 | 5 |
| Basic | 11 | 8 | 13 | 67 | 11 | 9 | 5 | 2 |
| Aliphatic | 22 | 24 | 32 | 18 | 56 | 36 | 23 | 7 |
| Aromatic | 2 | 2 | 8 | 34 | 23 | 7 | 1 | 0 |
| Small | 35 | 34 | 31 | 58 | 65 | 22 | 26 | 20 |
| Total | 75 | 75 | 89 | 201 | 160 | 78 | 57 | 34 |
The number of peptidases with a preference for a particular amino acid type for each binding pocket S4–S4′ is shown, where 40% or more of substrates have an amino acid of that type at that position. Only those 312 peptidases with at least 10 known cleavages are included. There are 276 peptidases that show a preference, of which 18 show a preference at all eight sites, 16 for seven sites, 12 for six sites, 17 for five sites, 33 for four sites, 46 for three sites, 64 for two sites and 70 for one site.
Number of peptidases with an amino acid preference
| Amino acid | S4 | S3 | S2 | S1 | S1′ | S2′ | S3′ | S4′ |
|---|---|---|---|---|---|---|---|---|
| Ala | 6 | 8 | 5 | 10 | 8 | 5 | 1 | |
| Cys | 1 | 1 | ||||||
| Asp | 3 | 16 | 2 | 3 | ||||
| Glu | 1 | 7 | 5 | 8 | 1 | 1 | 2 | |
| Phe | 2 | 1 | 5 | 12 | 10 | 2 | ||
| Gly | 3 | 1 | 11 | 17 | 12 | 2 | 6 | 5 |
| His | 1 | 2 | ||||||
| Ile | 2 | 1 | 8 | 1 | ||||
| Lys | 2 | 4 | 8 | 6 | 6 | 2 | 4 | |
| Leu | 11 | 4 | 9 | 12 | 24 | 4 | 7 | |
| Met | 1 | 6 | ||||||
| Asn | 9 | 1 | ||||||
| Pro | 2 | 8 | 5 | 9 | 9 | 4 | 1 | 4 |
| Gln | 9 | 1 | 5 | 1 | 10 | |||
| Arg | 8 | 1 | 2 | 52 | 5 | 3 | 1 | |
| Ser | 8 | 1 | 8 | 3 | 2 | 1 | ||
| Thr | 3 | 1 | 1 | 1 | ||||
| Val | 1 | 6 | 1 | 2 | 5 | 6 | 11 | 5 |
| Trp | 1 | 1 | ||||||
| Tyr | 11 | 1 | 5 |
The number of peptidases showing a preference for an amino acid in a binding site is shown. Only those 312 peptidases with 10 or more known substrate cleavages are included. An amino acid must occur at that position in 40% or more of substrates. Therefore, it is possible for two amino acids to be preferred in any one binding pocket, as is the case for trypsin 1 where there is a preference for either Lys (59% of substrates) or Arg (41%) in S1. There are 202 peptidases that show a preference, of which 13 show a preference at all eight sites, 13 for seven sites, five for six sites, three for five sites, eight for four sites, 23 for three sites, 49 for two sites and 88 for one site.
Peptidases showing unique preferences
| Peptidase name | Total substrate cleavages | S4 | S3 | S2 | S1 | S1′ | S2′ | S3′ | S4′ | |
|---|---|---|---|---|---|---|---|---|---|---|
| Chymosin | A01.006 | 15 | His | Ser | Ile | |||||
| Feline immunodefiency virus retropepsin | A02.007 | 28 | Val | |||||||
| Walleye dermal sarcoma virus retropepsin | A02.063 | 27 | Gln | |||||||
| PibD peptidase | A24.017 | 10 | Thr | |||||||
| gpr peptidase | A25.001 | 32 | Met | Ile | Glu | |||||
| Cruzipain | C01.075 | 49 | Arg | |||||||
| Coxsackievirus-type picornain 3C | C03.011 | 10 | Pro | |||||||
| Ubiquitinyl hydrolase-L3 | C12.003 | 14 | Arg | |||||||
| Legumain | C13.004 | 30 | Asn | |||||||
| Sapovirus 3C-like peptidase | C24.003 | 10 | Thr | |||||||
| Separase (yeast-type) | C50.001 | 12 | Glu | |||||||
| Peptidyl-dipeptidase Acer | M02.002 | 10 | Phe | |||||||
| Bacterial collagenase H | M09.003 | 18 | Ala | |||||||
| PrtA peptidase ( | M10.063 | 23 | Glu | |||||||
| ADAM8 peptidase | M12.208 | 22 | Gln | |||||||
| Tryptophanyl aminopeptidase ( | M9A.008 | 15 | Trp | |||||||
| Carboxypeptidase G3 | M9E.007 | 12 | Glu | |||||||
| Mast cell peptidase 4 ( | S01.005 | 33 | Trp | |||||||
| Kumamolisin | S53.004 | 10 | Val | Gly | Tyr | |||||
| Peroxisomal transit peptide peptidase | U9G.062 | 14 | Cys | Cys |
Figure 2.The specificity logo and frequency matrix showing the substrate specificity of caspase-3. The figure is taken from a page in the MEROPS database. The logo is shown at the top with the frequency matrix below. The cleavage pattern is a textual representation of the logo, where the scissile bond is shown as a red cross, and the binding pockets separated by forward slashes. The preferred residue is shown in uppercase if the preference is strong. The number of cleavages on which these data are based is given in parentheses. For the logo, the binding pockets S4–S4′ are shown along the x-axis, where 1 is S4, 2 is S3, etc. The bit score is shown on the y-axis. The height of the letter is proportional to the bit score. The letters are coloured to indicate amino acid properties: blue for basic, red for acidic, black for hydrophobic and green for any other. In the frequency matrix below the logo, each cell shows the number of substrates with an amino acid occupying one of the positions P4–P4′. Cells in the matrix are highlighted in shades of green where the greater the preference, i.e. the more often an amino acid occurs at that position, the brighter the shade. Cells are highlighted in black if the amino acid is unknown at that position for any substrate.
Figure 3.Alignment of the protein sequences of orthologues of the mouse BID protein showing known peptidase cleavages. The alignment is highlighted to show conservation of residues around the cleavage of BID by cathepsin H (C01.040) at residue 12. The sequence where the cleavage is known is highlighted in green and residues are numbered according to this sequence (inserts are indicated by letters). The rows beneath the residue numbers show the MEROPS identifier of each peptidase known to cleave this substrate. Arrows indicate the residue range of the fragment used in the experiment, and cleavage positions are indicated by the ‘+’ symbol. Clicking on a MEROPS identifier takes the user to the relevant summary page. Clicking on a ‘+’ symbol causes the alignment to be redrawn with residues P4–P4′ highlighted for that particular cleavage. Residues either side of the cleavage site are highlighted in pink if conserved with the equivalent residue in the sequence where the cleavage is known. A residue is highlighted in orange if it is not conserved but is known to occur in the same binding pocket in another cathepsin H substrate. A residue is shown as white on black if it is not conserved and is not known to occur in the same peptidase substrate binding site in any other substrate.
Assumed physiological cleavages that are not conserved in terms of peptidase substrate binding
| Substrate | UniProt accession | P1 | Peptidase [ | Replacements | Possible cause | Ref. |
|---|---|---|---|---|---|---|
| Serine protease HTRA2, mitochondrial | O43464 | 211 | HtrA2 peptidase [S01.278] (56) | ( | ||
| Cytochrome C | P00022 | 1 | mitochondrial methionyl aminopeptidase [M24.028] (131) | ( | ||
| Coagulation factor XIII A chain | P00488 | 38 | thrombin [S01.217] (169) | ( | ||
| Insulin-1 | P01325 | 87 | proprotein convertase 2 [S08.073] (59) | ( | ||
| Collagen alpha-2(I) chain | P02465 | 870 | cathepsin D [A01.009] (145) | ( | ||
| Collagen alpha-2(I) chain | P02465 | 863 | matrix metallopeptidase−1 [M10.001] (70) | ( | ||
| Collagen alpha-2(I) chain | P02465 | 863 | matrix metallopeptidase−8 [M10.002] (87) | ( | ||
| Platelet-derived growth factor subunit A | P04085 | 86 | Furin [S08.071] (116) | ( | ||
| Collagen alpha-2(IV) chain | P08572 | 1077 | kallikrein-related peptidase 14 [S01.029] (49) | ( | ||
| Collagen alpha-2(IV) chain | P08572 | 1109 | kallikrein-related peptidase 14 [S01.029] (49) | ( | ||
| Insulin-like growth factor-binding protein 1 | P08833 | 165 | Matriptase [S01.302] (26) | ( | ||
| Acyl-CoA thioesterase I | P0ADA1 | 26 | Signal peptidase I [S26.001] (294) | ( | ||
| Protein ygiW | P0ADU5 | 20 | Signal peptidase I [S26.001] (294) | ( | ||
| Chymotrypsin inhibitor 3 | P10822 | 24 | Signalase (animal) 21 kDa component [S26.010) (363) | ( | ||
| Plastocyanin minor isoform, chloroplastic | P11490 | 72 | Thylakoidal processing peptidase [S26.008] (52) | ( | ||
| 50S ribosomal protein L7Ae | P12743 | 1 | Methionyl aminopeptidase 2 [M24.002] (130) | ( | ||
| Beta-crystallin B3 | P19141 | 4 | Calpain-1 [C02.001] (101) | ( | ||
| Beta-crystallin B3 | P19141 | 10 | Calpain-1 [C02.001] (101) | ( | ||
| 1-phosphatidylinositol-4,5-bisphosphate phosphodiesterase gamma-1 | P19174 | 770 | Caspase-7 [C14.004] (112) | ( | ||
| Mimecan | P20774 | 219 | ADAMTS4 peptidase [M12.221] (57) | ( | ||
| Mimecan | P20774 | 234 | ADAMTS4 peptidase [M12.221] (57) | ( | ||
| Trypsin inhibitor 2 | P26780 | 30 | Signalase (animal) 21 kDa component [S26.010] (363) | ( | ||
| 60S ribosomal protein L10 | P27635 | 180 | Granzyme B ( | ( | ||
| Chitinase 2 | P29027 | 22 | Signalase (animal) 21 kDa component [S26.010] (363) | ( | ||
| Alpha-synuclein | P37840 | 122 | Calpain-1 | ( | ||
| [C02.001] (101) | ||||||
| Cathepsin E | P43159 | 53 | Cathepsin E [A01.010] (64) | ( | ||
| 40S ribosomal protein S25 | P62852 | 51 | Granzyme B, | ( | ||
| rodent-type [S01.136] (231) | ||||||
| Hemoglobin subunit alpha | P69905 | 37 | Cathepsin D [A01.009] (145) | ( | ||
| Hemoglobin subunit alpha | P69905 | 109 | Cathepsin D [A01.009] (145) | ( | ||
| Hemoglobin subunit alpha | P69905 | 110 | Cathepsin D [A01.009] (145) | ( | ||
| ABC transporter periplasmic-binding protein yphF | P77269 | 26 | Signal peptidase I [S26.001] (294) | ( | ||
| Tyrosine-protein phosphatase non-receptor type 18 | Q61152 | 424 | Caspase-1 [C14.001] (60) | ( | ||
| Cartilage intermediate layer protein 2 | Q8IUL8 | 810 | ADAMTS5 peptidase [M12.225] (38) | ( | ||
| Cartilage intermediate layer protein 2 | Q8IUL8 | 811 | ADAMTS5 peptidase [M12.225] (38) | ( | ||
| Cartilage intermediate layer protein 2 | Q8IUL8 | 813 | ADAMTS5 peptidase [M12.225] (38) | ( | ||
| Cartilage intermediate layer protein 2 | Q8IUL8 | 830 | ADAMTS5 peptidase [M12.225] (38) | ( | ||
| Cartilage intermediate layer protein 2 | Q8IUL8 | 832 | ADAMTS5 peptidase [M12.225] (38) | ( | ||
| Probable FKBP-type peptidyl-prolyl cis-trans isomerase 1, chloroplastic | Q9LM71 | 71 | Thylakoidal processing peptidase [S26.008] (52) | ( | ||
The substrate name, UniProt accession, number of the residue occupying the P1 position in the known cleavage, the peptidase performing the cleavage (with MEROPS identifier in square brackets and the total of known substrates for the peptidase in parentheses), the sequence occupying P4–P4′ in the known cleavage and replacements unobserved in other substrates, the possible cause (a–h, see text for details), and the reference describing the cleavage are given. The numbers in parenthesis after the sequence are the number of homologues where the cleavage site is conserved (those identical to the known cleavage plus acceptable replacements) and the number of sequences where a replacement has occurred that has not been observed in any substrate for the peptidase. A hyphen indicates a conserved amino acid or an acceptable replacement, an ‘x’ indicates a gap character inserted in the alignment. A space indicates where no amino acid is possible (e.g. in P4, P3 and P2 for an aminopeptidase cleavage). Data are arranged by UniProt accession and the P1 position.