| Literature DB >> 31479443 |
Qinwei Zhuang1, Brandon Alexander Holt2, Gabriel A Kwong2,3,4,5,6, Peng Qiu2,3.
Abstract
Proteases are multifunctional, promiscuous enzymes that degrade proteins as well as peptides and drive important processes in health and disease. Current technology has enabled the construction of libraries of peptide substrates that detect protease activity, which provides valuable biological information. An ideal library would be orthogonal, such that each protease only hydrolyzes one unique substrate, however this is impractical due to off-target promiscuity (i.e., one protease targets multiple different substrates). Therefore, when a library of probes is exposed to a cocktail of proteases, each protease activates multiple probes, producing a convoluted signature. Computational methods for parsing these signatures to estimate individual protease activities primarily use an extensive collection of all possible protease-substrate combinations, which require impractical amounts of training data when expanding to search for more candidate substrates. Here we provide a computational method for estimating protease activities efficiently by reducing the number of substrates and clustering proteases with similar cleavage activities into families. We envision that this method will be used to extract meaningful diagnostic information from biological samples.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31479443 PMCID: PMC6743790 DOI: 10.1371/journal.pcbi.1006909
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 5Hierarchical clustering of individual proteases led to 4 families.
MASP2+CFI+CFD formed the most strongly correlated family, C1r + C1s formed the moderately correlated family, and F11 and F2 each served as its own family.
Mass-barcoded peptide substrate sequences.
Table describing the sequences of the mass-barcoded peptide substrates along with their chemical modifications. ANP was used as a photocleavable linker to enable rapid detachment from the microparticles. 5-FAM was used for rapid quantification via fluorescence. Isotope enrichment modifications were used to distinguish mass barcodes for quantification with mass spectrometry.
| Substrate Name | Peptide sequence (N terminus on left) | Modifications |
|---|---|---|
| CC01 | e(*aa)(*aa)ndneeGFFsAr(ANP)K(5-FAM)GGLQRIYKC | 1st *aa = Gly(13C2); 2nd *aa = Val(U13C5,15N) |
| CC02 | eG(*aa)ndneeGF(*aa)s(*aa)r(ANP)K(5-FAM)GGKSVARTLLVKC | 1st *aa = Val(U13C5,15N); 2nd *aa = Phe(15N); 3rd *aa = Ala(15N) |
| CC03 | e(*aa)(*aa)ndneeGFFs(*aa)r(ANP)K(5-FAM)GGQRQRIIGGC | 1st *aa = Gly(U13C2,15N); 2nd *aa = Val(15N); 3rd *aa = Ala (U13C3,15N) |
| CC04 | e(*aa)Vndnee(*aa)FFs(*aa)r(ANP)K(5-FAM)GGKYLGRSYKVC | 1st *aa = Gly(13C2); 2nd *aa = Gly(13C2); 3rd *aa = Ala(U13C3,15N) |
| CC05 | eGVndnee(*aa)(*aa)Fs(*aa)r(ANP)K(5-FAM)GGGLQRALEIC | 1st *aa = Gly(U13C2,15N); 2nd *aa = Phe(15N); 3rd *aa = Ala(U13C3,15N) |
| CC06 | e(*aa)(*aa)ndnee(*aa)(*aa)(*aa)s(*aa)r(ANP)K(5-FAM)GGKTTGGRIYGGC | 1st *aa = Gly(13C2); 2nd *aa = Val(U13C5,15N); 3rd *aa = Gly(U13C2,15N); 4th *aa = Phe(15N); 5th *aa = Phe(15N); 6th *aa = Ala(15N); still include ANP and K5-FAM |
| CC07 | eG(*aa)ndnee(*aa)(*aa)Fs(*aa)r(ANP)K(5-FAM)GGQARGGSC | 1st *aa = Val(U13C5,15N); 2nd *aa = Gly(U13C2,15N); 3rd *aa = Phe(15N); 4th *aa = Ala(U13C3,15N) |
*ANP = Photocleavable linker 3-Amino-3-(2-nitrophenyl)propionic acid
*5-FAM = 5—Carboxyfluorescein
**Modifications represent heavy amino acids (i.e., isotope enrichment)
The first row is true α in P mixtures, of which each has N proteases.
The second row is estimated α. The RMSEs for individual proteases (R1, …, RN) are calculated in the third row, and the overall RMSE will be the average of all individual RMSEs. In the simulation setting, P is the number of repetitions we applied. The repetition time is P = 200.
| Protease 1 | Protease 2 | … | Protease N | |
|---|---|---|---|---|
| True | … | |||
| Estimated | … | |||
| RMSE (Protease) | … | |||
| RMSE (overall) | ||||