| Literature DB >> 25011440 |
Keding Cheng1, Angela Sloan, Stuart McCorrister, Shawn Babiuk, Timothy R Bowden, Gehua Wang, J David Knox.
Abstract
BACKGROUND: Mass spectrometry (MS) is a very sensitive and specific method for protein identification, biomarker discovery, and biomarker validation. Protein identification is commonly carried out by comparing MS data with public databases. However, with the development of high throughput and accurate genomic sequencing technology, public databases are being overwhelmed with new entries from different species every day. The application of these databases can also be problematic due to factors such as size, specificity, and unharmonized annotation of the molecules of interest. Current databases representing liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based searches focus on enzyme digestion patterns and sequence information and consequently, important functional information can be missed within the search output. Protein variants displaying similar sequence homology can interfere with database identification when only certain homologues are examined. In addition, recombinant DNA technology can result in products that may not be accurately annotated in public databases. Curated databases, which focus on the molecule of interest with clearer functional annotation and sequence information, are necessary for accurate protein identification and validation. Here, four cases of curated database application have been explored and summarized.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25011440 PMCID: PMC4102332 DOI: 10.1186/1756-0500-7-444
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Search output produced by searching MS sequence data of various peptides against curated databases (CD) and the public databases, MSDB, NCBInr, and PBR
| 1a | Sheeppox virus | SDS-PAGE gel band | Unknown band (104 kD) | MSDB: lumpy disease virus protein | PBR: sheeppox virus protein | ||
| 859 | 51 | 1039 | 80 | ||||
| 2b | Human | In-solution digest | tau, transcript variant 2 (40.27 kD) | NCBInr: PNS specific tau, 78.8 kD | CD: tau, transcript variant 2, 40.27 kD | ||
| 465 | 29 (17)¶ | 1615 | 34 (27) | ||||
| 3b | Sheep-hamster (chimera) | SDS-PAGE gel band | Sheep-hamster chimeric PrP | NCBInr: PrP in Dpc Micelles | CD: sheep-hamster chimeric PrP | ||
| 4987 | 1(1) | 3857 | 9(8) | ||||
| 4b | In-solution digest | Flagellin H37 | NCBInr: bacterial flagellin ( | CD: H37, gi|30059966| | |||
| 18862 | 31(26) | 29742 | 33(31) | ||||
aA QSTAR system was used to test the samples and Mascot database search with 0.4 kD peptide mass tolerance, 0.4 kD MS/MS tolerance, two missed tryptic cleavages, possible methionine oxidation, and all cysteine residues as carboxamidomethyl-cysteine due to alkylation with iodoacetamide.
bAn Orbitrap system was used with 30 ppm peptide mass tolerance, 0.5 kD MS/MS tolerance, and two missed tryptic cleavage for all database searches. Oxidation on methionine and deamidation on glutamine and asparagines were chosen as possible modifications.
¶Numbers without brackets denote total specific peptide match numbers while numbers in brackets denote significant specific peptide match numbers as per the Mascot search engine.
Search output produced by searching sheep-hamster PrP MS sequence data against a curated prion protein database (CD) alone and in conjunction with the public database, Swissprot
| SDS-PAGE gel band (replicate 1) | 4117 | 12(11)¶ | 2232 | 12(10) |
| SDS-PAGE gel band (replicate 2) | 2734 | 10(8) | 1540 | 10(7) |
¶Numbers without brackets denote total specific peptide match numbers while numbers in brackets denote significant specific peptide match numbers as per the Mascot search engine.
Search output produced by searching flagellin MS sequence data against a curated flagellin database (CD) alone and in conjunction with the public database, Swissprot
| E169 | H1 | H1 | 14607 | 10922 | 57(55)¶ | 57(49) | 98 | 98 |
| E170 | H2 | H2 | 1754 | 1113 | 37(34) | 37(27) | 80 | 80 |
| E171 | H3 | H3 | 8117 | 5735 | 52(46) | 50(39) | 91 | 90 |
| E172 | H4 | H4 | 3894 | 2893 | 28(26) | 28(21) | 89 | 89 |
| E173 | H5 | H5 | 1568 | 1167 | 26(23) | 24(16) | 81 | 74 |
| E174 | H6 | H6 | 6123 | 4513 | 46(44) | 46(38) | 90 | 90 |
| EDL933 | H7 | H7 | 6131 | 4511 | 56(54) | 55(48) | 90 | 90 |
| E176 | H8 | H8 | 5538 | 3916 | 44(43) | 43(39) | 90 | 89 |
| E177 | H9 | H9 | 10426 | 8099 | 53(51) | 52(47) | 80 | 80 |
| E659 | H10 | H10 | 7281 | 5042 | 47(47) | 47(41) | 98 | 98 |
| 902380 | H7 | H7 | 3421 | 2515 | 43(40) | 42(35) | 84 | 82 |
| 050958 | H7 | H7 | 2656 | 1999 | 38(36) | 38(31) | 78 | 78 |
| 090414 | H7 | H7 | 5223 | 3943 | 46(44) | 45(42) | 94 | 94 |
| 091349 | H7 | H7 | 5887 | 4459 | 52(49) | 52(46) | 94 | 94 |
| 091350 | H7 | H7 | 3404 | 2522 | 44(42) | 43(37) | 89 | 88 |
¶Numbers without brackets denote total specific peptide match numbers while numbers in brackets denote significant specific peptide match numbers as per the Mascot search engine.
Top hits produced by searching flagellin MS data against a curated flagellin database (CD) and the public databases, Swiss-prot and NCBInr
| E169 | H1 | H1 | flagellin [ | |
| E170 | H2 | H2 | flagellin [ | |
| E171 | H3 | H3 | flagellin [ | |
| E172 | H4 | H4 | flagellin [ | |
| E173 | H5 | H5 | ||
| E174 | H6 | H6 | FliC [ | |
| EDL933 | H7 | H7 | flagellin [ | |
| E176 | H8 | H8 | flagellin [ | |
| E177 | H9 | H9 | flagellin [ | |
| E659 | H10 | H10 | flagellin [ | |
| 902380 | H7 | H7 | flagellin [ | |
| 050958 | H7 | H7 | flagellin [ | |
| 090414 | H7 | H7 | flagellin [ | |
| 091349 | H7 | H7 | flagellin [ | |
| 091350 | H7 | H7 | flagellin [ |
aAn Orbitrap system was used with 30 ppm peptide mass tolerance, 0.5 kD MS/MS tolerance, one missed tryptic cleavage for all database searches. Oxidation on methionine and deamidation on glutamine and asparagine were chosen as a possible modification.