| Literature DB >> 18424801 |
Lei Li1, Chenggang Wu, Haiming Huang, Kaizhong Zhang, Jacob Gan, Shawn S-C Li.
Abstract
Systematic identification of binding partners for modular domains such as Src homology 2 (SH2) is important for understanding the biological function of the corresponding SH2 proteins. We have developed a worldwide web-accessible computer program dubbed SMALI for scoring matrix-assisted ligand identification for SH2 domains and other signaling modules. The current version of SMALI harbors 76 unique scoring matrices for SH2 domains derived from screening oriented peptide array libraries. These scoring matrices are used to search a protein database for short peptides preferred by an SH2 domain. An experimentally determined cut-off value is used to normalize an SMALI score, therefore allowing for direct comparison in peptide-binding potential for different SH2 domains. SMALI employs distinct scoring matrices from Scansite, a popular motif-scanning program. Moreover, SMALI contains built-in filters for phosphoproteins, Gene Ontology (GO) correlation and colocalization of subject and query proteins. Compared to Scansite, SMALI exhibited improved accuracy in identifying binding peptides for SH2 domains. Applying SMALI to a group of SH2 domains identified hundreds of interactions that overlap significantly with known networks mediated by the corresponding SH2 proteins, suggesting SMALI is a useful tool for facile identification of signaling networks mediated by modular domains that recognize short linear peptide motifs.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18424801 PMCID: PMC2425477 DOI: 10.1093/nar/gkn161
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic representation of the SMALI program. (A) An OPAL-SH2 binding profile (shown here for the BRDG1 SH2 domain) was used to generate a position specific scoring matrix (PSSM) (B). (C) The PSSM was used to search a protein database for tyrosine-containing peptides that are preferred by a query SH2 domain. (D) Selected peptides are ranked according to their SMALI scores and put out either unfiltered or filtered through one or more filters as shown. (E) The output file size can be selected. A sample output file is shown (see text for detail).
Figure 2.Sample output of the domain-scan module in SMALI. (A) A query protein can be entered with an ID or by typing in the sequence in the space provided. Partial sequence is also acceptable. One or more SH2 domains in the pull-down menu may be selected for the prediction. (B) Tabulated results showing the query protein name, sequence, locations of Tyr residues and SH2 domains predicted to bind a particular Tyr site (assuming the site is phosphorylated). A relative SMALI score is given in parenthesis beside a selected SH2 domain. Only SH2 domains with a relative score of >1.0 are listed.
Figure 3.Validation of SMALI predicted interactions by peptide array and derivation of cut-off SMALI values. (A) Binding profile of the BRDG1 SH2 domain to an array of 1488 top-ranked phosphotyrosine-containing peptides selected by SMALI from the Swiss-Prot human protein database. (B) Binding of the GRB2 SH2 domain to 720 phosphopeptides taken from the Phosphosite database (15). The first 360 peptides (upper portion) was based on SMALI prediction, whereas the second half (lower portion) was randomly chosen from the database. Dark spots indicate positive binding. (C and D) Distribution of binding peptides over SMALI scores for the BRDG1 (C) and GRB2 SH2 (D) domains. The histograms show ‘hit rate’, defined as the percentage of binding peptides, at a given SMALI score range (in increments of 0.1 and 0.2, respectively for C and D). (E and F) An optimal SMALI cut-off value is arbitrarily defined as the SMALI score that produces the greatest F-measure. F-measure = 2 × precision × recall/(precision + recall), where precision = binding peptides correctly predicted/binding peptides predicted and recall = binding peptides correctly predicted/real binding peptides. For the BRDG1 SH2 domain, the SMALI score 1.4 produced the largest F-measure 0.84 (E). Coincidently, this SMALI value corresponds to a hit-rate of ∼50%. For the GRB2 SH2 domain, the cut-off SMALI score is 1.6. (F and G) Distribution of all Tyr-containing peptides (total 203 494) in Swiss-Prot human database according to SMALI scores calculated using PSSM for BRDG1 (G) or the GRB2 SH2 (H) domain. The SMALI cut-off of 1.4 for the BRDG1 SH2 domain corresponds to the top 3.5% scoring peptides located to the right of the cut-off value (G). For GRB2 SH2, the cut-off corresponds to the top 5.5% peptides ranked according to SMALI.
Known GRB2 SH2-peptide interactions re-examined in the peptide array experiment
| SH2 Protein (Alias) | Description | pY site | pY-peptide | SMALI score | Peptide array | References |
|---|---|---|---|---|---|---|
| BCR_HUMAN (Bcr) | Breakpoint cluster region protein | 177 | KPFpYVNVEF | 2.67 | + | ( |
| IRS1_RAT (Irs1) | Insulin receptor substrate 1 | 895 | PGEpYVNIEF | 2.61 | + | ( |
| FAK2_HUMAN (PYK2) | Focal adhesion kinase 2 | 881 | DLVpYLNVME | 2.53 | + | ( |
| ERBB2_HUMAN (ErbB2) | Receptor tyrosine-protein kinase erbB-2 | 1139 | QPEpYVNQPD | 2.51 | + | ( |
| FAK1_HUMAN (FAK) | Focal adhesion kinase | 925 | DKVpYENVTG | 2.43 | + | ( |
| SHC1_HUMAN (Shc) | SHC-transforming protein 1 | 427 | DPSpYVNVQN | 2.42 | + | ( |
| VGFR1_HUMAN (VEGFR-1) | Vascular endothelial growth factor receptor 1 | 1213 | DVRpYVNAFK | 2.41 | + | ( |
| PGFRB_HUMAN (PDGFR-β) | Beta-type platelet-derived growth factor receptor | 716 | AELpYSNALP | 2.40 | + | ( |
| LAT_MOUSE (LAT) | Linker for activation of T-cells family member 1 | 175 | IDDpYVNVPE | 2.38 | + | ( |
| TIE2_HUMAN (TIE2) | Angiopoietin-1 receptor | 1102 | RKTpYVNTTL | 2.35 | + | ( |
| LAT_MOUSE (LAT) | Linker for activation of T-cells family member 1 | 235 | APDpYENLQE | 2.24 | + | ( |
| PTN11_HUMAN (Ptpn11) | Tyrosine-protein phosphatase non-receptor type 11 | 546 | GHEpYTNIKY | 1.94 | + | ( |
| SHC1_HUMAN (Shc) | SHC-transforming protein 1 | 349 | DHQpYYNDFP | 1.86 | + | ( |
aProtein names are according to Swiss-Prot convention with the commonly used alias given in parenthesis.
bPeptides showing positive binding in the array (Figure 3B) are identified with ‘+’. See Methods section for details of experimentation.
Figure 4.Validation of peptide ligands for the SH2 domains of CRK (A), NCK (B) and FGR (C), respectively as identified by SMALI (upper half of each peptide array) or Scansite (bottom half). For each SH2 domain, a total of 336 peptides were examined, of which the first 168 was identified as top binders by SMALI and the last 168 by the Scansite. The sequences of the peptides and their respective ranking orders on SMALI or Scansite are provided in Tables S3–S5. See also Table 2 for a summary of the result.
Accuracy of prediction for SH2-binding peptides by SMALI or Scansite
| SH2 domain | SMALI score cut-off | SMALI | Scansite | ||
|---|---|---|---|---|---|
| SMALI score (average, SD) | Hit rate (%) | SMALI score (average, SD) | Hit rate (%) | ||
| NCK1 | 1.40 | 2.02, 0.11 | 40 | 1.73, 0.29 | 15 |
| CRK | 1.65 | 2.19, 0.08 | 90 | 1.64, 0.28 | 32 |
| FGR | 1.35 | 1.84, 0.10 | 98 | 1.49, 0.30 | 87 |
aPeptides with spot values >0.8 are defined as binding peptides for the NCK1 SH2 domain, >0.7 for the CRK SH2 domain and >0.4 for FGR SH2 domain, based on the distribution of spot values in a peptide array experiment (see Materials and Methods section for details; see also Figure 4 and Tables S3–S5 for experimental data).
Overlap between SMALI-predicted SH2-ligand interactions and those listed in PPI databases
| SH2 domain classification | SH2-containing proteins | SH2-interacting proteins predicted by SMALI | SH2-interacting proteins included in PPI databases | Intersection between SMALI and PPI space | Overlap of SMALI network with PPI databases (%) | Statistical significance of overlap |
|---|---|---|---|---|---|---|
| IA | SRC | 298 | 104 | 63 | 15 (23.8) | <0.0004 |
| IA | LYN | 204 | 69 | 50 | 13 (26.0) | <0.00001 |
| IA | ABL1 | 253 | 63 | 46 | 11 (23.9) | <0.0006 |
| IA | FYN | 313 | 99 | 69 | 14 (20.3) | <0.006 |
| IB | CRK | 395 | 44 | 35 | 14 (40.0) | <0.00005 |
| IB | CRKL | 274 | 40 | 30 | 9 (30) | <0.0006 |
| IC | GRB2 | 420 | 383 | 250 | 68 (27.2) | <0.00001 |
| IC | GRAP2 | 308 | 27 | 18 | 7 (38.9) | <0.0009 |
| IIA | PIK3R1 | 317 | 98 | 73 | 35 (49.3) | <0.00001 |
| IIA | PTPN11 | 288 | 67 | 59 | 20 (33.9) | <0.00001 |
| IIA | VAV1 | 170 | 54 | 45 | 13 (28.9) | <0.00001 |
| IIB | SHC1 | 275 | 98 | 77 | 22 (28.6) | <0.00001 |
aThe SH2 domain classification is based on (13). Group IA has a common motif poY−−φ, IB has poYxxφ, IC has poYxNx, IIA has poYφxφ and IIB has the motif poY[E/D/x]xφ, where ‘−’ denotes a negatively charged residue, φ denotes a hydrophobic residue and x is any type of residues.
bNumber of proteins predicted to bind to a specific SH2 domain-containing protein in the table by SMALI with a relative score >1.0. The Phosphorylation filter is applied with the prediction.
cNumber of binding proteins for a specific SH2-containing protein according to PPI databases I2D (24) and IntAct (22).
dSMALI space is the number of proteins (3253) used in the prediction. These include all proteins listed in the PhosphoSite and Phospho.ELM databases that contain a pTyr. Intersection is defined as the protein space covered by both SMALI and the PPI databases.
eNumber of common interactions shared between the PPI databases and SMALI prediction for a given SH2-containing protein. The percentage of overlap (in parenthesis) is calculated by dividing this number by the intersected space between PPI and SMALI.
fObserved overlap over that expected by chance.