| Literature DB >> 25627479 |
Umesh K Nandal1, Wytze J Vlietstra2, Carsten Byrman3, Rienk E Jeeninga4, Jeffrey H Ringrose5, Antoine H C van Kampen6,7, Dave Speijer8, Perry D Moerland9.
Abstract
BACKGROUND: Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25627479 PMCID: PMC4384356 DOI: 10.1186/s12859-015-0455-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Prioritization of candidate proteins based on pI and Mw. Step 1: pI and Mw (Da) of the mature forms of the proteins identified by PMF are determined using the ExPASy tool “Compute pI/Mw” [13]. Step 2: The (x,y) coordinates of the identified spots and their corresponding pI and Mw (on log10-scale) are used as training data for fitting two cubic smoothing splines. Step 3: For an unidentified test spot u, a candidate list of proteins is generated using the ExPASy tool TagIdent [14] by specifying ranges Δ and δ(%) around the pI and Mw predicted by the smoothing splines, respectively. Step 4: Proteins in the candidate list are ranked by calculating their similarities with the PMF-identified ‘seed’ proteins using STRING association scores. Step 5 (optional): The ranked candidate list can be further filtered using presence (black) and absence (white) calls from the Gene Expression Barcode 3.0 [15]. A protein is excluded from the ranked list if the corresponding gene is expressed on none of the selected microarrays.
Figure 2Influence of pI range ( ) and Mw range ( ) specified for TagIdent. (A) Influence on the average number of proteins in the candidate list. (B) Influence on recall, that is the fraction of seed proteins included in their own candidate list as returned by TagIdent. For each identified spot in the 2D-DIGE dataset and all combinations of predefined values for the pI and Mw range, a candidate list was generated following Steps 1–3 of our prioritization approach using LOOCV.
Figure 3Prioritization performance. (A) True-positive rates TPR for the top n=5,10,15,25 ranked candidates using Steps 1–4 of our prioritization method for all combinations of predefined values for the pI range (Δ) and Mw range (δ). (B) Gain (red) or loss (blue) in TPR w.r.t. the TPR reported in panel (A) when also performing gene expression-based filtering (Step 5). Combinations of pI and Mw range for which the TPR reaches its maximal value are indicated with solid black dots (without filtering) and open magenta dots (with filtering).
Prioritization performance
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Neighbourhood | 0 | 0.238 | 0.276 | 0.305 | 0.352 |
| Gene fusion | 0 | 0.076 | 0.095 | 0.114 | 0.181 |
| Cooccurrence | 0 | 0.2 | 0.248 | 0.257 | 0.286 |
| Coexpression | 0 | 0.41 | 0.514 |
| 0.61 |
| Experiments | 0 | 0.295 | 0.343 | 0.39 | 0.457 |
| Database | 0 | 0.229 | 0.295 | 0.343 | 0.39 |
| Textmining | 0 | 0.371 | 0.438 | 0.486 | 0.533 |
| Combined | 0 | 0.438 |
|
| 0.6 |
| Combined (gene expression-based filter) | 0 |
|
|
|
|
| Combined | 0.15 | 0.429 |
|
| 0.6 |
| Combined | 0.4 | 0.410 | 0.476 | 0.505 | 0.543 |
| Combined | 0.7 | 0.305 | 0.4 | 0.429 | 0.514 |
True-positive rates TPR estimated using LOOCV for the top n = 5, 10, 15, 25 ranked candidates using our prioritization approach with single evidence type association scores and combined association scores. STRING assocation scores with a value less than the cut-off value were not taken into account. With a cut-off value of zero all associations contribute to the overall ranking score. Maximal TPR across all combinations of predefined values for the pI and Mw range is reported. For each value of n the highest TPR is indicated in bold.
Figure 4STRING-based visualization of candidate proteins. Visualization of the top-5 candidate proteins for unidentified spot (x,y)=(669,201) and the seed proteins directly connected to them in STRING. Candidates 1–5 are shown with red highlights of decreasing intensity, highlighted using STRING’s payload mechanism. Connections between proteins indicate the confidence for an association, stronger associations are represented by thicker lines.