| Literature DB >> 20796299 |
Ali Al-Shahib1, Raju Misra, Nadia Ahmod, Min Fang, Haroun Shah, Saheer Gharbia.
Abstract
BACKGROUND: Robust biomarkers are needed to improve microbial identification and diagnostics. Proteomics methods based on mass spectrometry can be used for the discovery of novel biomarkers through their high sensitivity and specificity. However, there has been a lack of a coherent pipeline connecting biomarker discovery with established approaches for evaluation and validation. We propose such a pipeline that uses in silico methods for refined biomarker discovery and confirmation.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20796299 PMCID: PMC2939613 DOI: 10.1186/1471-2105-11-437
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Proteomics pipeline for biomarker discovery. IP is Identified Peptides and UP is Unique Peptides. This process is applied to each search algorithm and the refined list of markers from each algorithm is compared and a consensus of the peptides in all algorithms is selected for a final validated list of markers.
Number of peptides identified by each algorithm
| Algorithm(s) | Number of peptides from raw files | Number of biomarkers | |
|---|---|---|---|
| X!Tandem | 6920 | 34 | |
| Mascot | 4465 | 23 | |
| Sequest | 7923 | 12 | |
| X!Tandem | 2324 | 17 | |
| X!Tandem | 3342 | 12 | |
| Sequest | 2241 | 9 | |
| X!Tandem | 2081 | 8 |
∩ indicates the overlap between the algorithms specified.
Figure 2Refinement of peptides. Unique Peptides (UP) were identified by selecting the peptides that had an exact hit in BLAST against the NCBI non-redundant database. These peptides were refined by selecting the consensus of peptides in all biological triplicates (Rep ∩ 1 ∩ 2 ∩ 3) followed by selecting the peptides that are not present in non-Clostridium botulinum species (Non-CBOT). The final phase selects only those peptides that are present in all six C. botulinum strains (All strains). We have labelled these peptides as 'Biomarkers'.
Figure 3non-Clostridium botulinum species union. Venn diagram showing the intersection between all peptides identified from non-Clostridium botulinum species by X!Tandem, Mascot and Sequest.
Figure 4Overlap and concordance between database search algorithms. Both diagrams show the intersection between the three algorithms in peptide identification. The diagram on the left shows the intersection of the algorithms at identifying peptides prior to in silico validation (first step in the pipeline). The diagram on the right shows the overlap of the different algorithms in biomarker discovery after the complete implementation of the pipeline. The arrow represents the flow of the pipeline to achieve a final refined list of biomarkers.
Sequence and functions of C. botulinum candidate biomarkers
| Rank | Biomarker Sequence | Algorithm(s) | Protein Annotation | |
|---|---|---|---|---|
| 1 | EAEYIFGNFGK | X | Unknown | |
| EDLTNVFDLSER | X | Unknown | ||
| ENLITAPENTTIGEAK | X | Metabolism Multiple pathways | ||
| GMDTLLESFIK | X | Metabolism Multiple pathways | ||
| INEFPEILEYK | X | Enzyme Families | ||
| MNEIMQDDTLER | X | Carbohydrate Metabolism | ||
| TIVSLDEIEIK | X | Translation | ||
| YANIADYLSLGGK | X | Unknown | ||
| 2 | AAGLEIGETGAIK | X | Unknown | |
| AFIESVEEALEGGEK | X | Replication and Repair | ||
| GAYVLNKEEIEK | X | Metabolism Multiple pathways | ||
| GEPIVLDNGAVAGQAFR | X | Replication and Repair/Cell motility | ||
| IMNDFSMLASK | X | Unknown | ||
| NDIVVVSPDLGSVTR | X | Metabolism Multiple pathways | ||
| QAATAIDEYLSK | X | Metabolism Multiple pathways | ||
| TSAEGIEIVAK | X | Amino Acid Metabolism | ||
| DLSTNSWTMIR | X | Replication and Repair | ||
| IIDMNNAAIDEGVNAIVK | X | Unknown | ||
| TTTIPSMVEALSR | X | Translation | ||
| 3 | AMESILVVIPAEK | X | Unknown | |
| CLVAEEETGLTTR | X | Metabolism Multiple pathways | ||
| EALTEYLLNMSTR | X | Metabolism Multiple pathways | ||
| EGSFIYVIGPK | X | Energy Metabolism | ||
| GIENVNVFTVR | X | Unknown | ||
| GLEVQEEVLNK | X | Nucleotide Metabolism | ||
| GTNIVNIIPIENNEK | X | Replication and Repair | ||
| GTNIVNLIPIENNEK | X | Replication and Repair | ||
| IGNIVEHEETPQISGMIK | X | Translation | ||
| ISTVGDVVDYIK | X | Unknown | ||
| KWDEDKFEEVMK | X | Unknown | ||
| TFGVELEDEPSGK | X | Amino Acid Metabolism | ||
| VGIAHNVTPETVEK | X | Amino Acid Metabolism | ||
| WYIVDAADKPLGR | X | Translation | ||
| LGLAEDEAIESK | M | Membrane Transport | ||
| MIGYDIFENEEAK | M | Carbohydrate Metabolism | ||
| MIGYDLFENEEAK | M | Carbohydrate Metabolism | ||
| QVLSFVTEETVVQR | M | Unknown | ||
| SMPALITAISELNQPR | M | Unknown | ||
| TRFETNLAVANHLVDK | M | Translation | ||
X = X!Tandem, M = Mascot, S = Sequest and ∩ = intersection. The first column indicates the rank of the biomarkers. Rank 1 markers have higher confidence than the other markers due to the concordance of the search algorithms in their identification. The second column shows the amino acid sequence of the candidate biomarker. Column 3 shows the algorithm(s) used to identify the candidate biomarker. Where there is no algorithm intersection, the candidate biomarker was exclusively identified by the single algorithm. The final column indicates the functional classification of the proteins (of the listed peptides) obtained from KEGG [21]. The biomarkers fall into 4 different categories: Metabolism, Genetic Information Processing (Replication and Repair, Cell motility and Translation), Enviromental Information Processing (Membrane Transport) and Unknown.