| Literature DB >> 24476026 |
John Marshall1, Peter Bowden, Jean Claude Schmit, Fay Betsou.
Abstract
Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers.Entities:
Year: 2014 PMID: 24476026 PMCID: PMC4015845 DOI: 10.1186/1559-0275-11-3
Source DB: PubMed Journal: Clin Proteomics ISSN: 1542-6416 Impact factor: 3.988
Figure 1The probability of homology between a subset of 27,254 distinct blood proteins as determined by the BLAST algorithm. Note that about eight thousands proteins matches showed probability values less that E-180 (machine 0) and so are not shown.
Figure 2The distribution of gap openings in homologous proteins as calculated by BLAST. Note that almost 9000 protein matches showed perfect alignments with no gaps in the matched amino acid sequences. In contrast, a small subset of about one thousand proteins showed three or more gaps in the matched sequence.
Figure 3The distribution of Logprotein match alignment lengths. Note that almost 13,000 protein matches showed protein alignments of greater than 100 contiguous amino acids. Typically a contiguous stretch of 20 amino acids is considered sufficient evidence to indicate a potential structural relationship between proteins.
Figure 4The plot of logmis matches to protein match number. Note that more than seven thousand proteins had few or no mis matches along the protein length. In contrast about four thousand proteins showed between 10 and one thousand mis-matches along the matched protein length.
Figure 5The plot of percentage identity between protein matches. Note that some twelve thousand protein matches show at least 70% identity over the full length of the query sequence that typically indicates a strong structural relationship between the protein sequences.
Figure 6The logpeptide to protein distribution of the human blood proteins. A set of published human blood data were parsed into SQL and the distributions of the data derived and graphed in SAS JMP.
Figure 7The plot of distinct peptide count versus distinct protein number. Note that about 12,000 proteins were only detected by 1 peptide. In contrast, a total of 10,138 distinct protein sequences were correlated by 3 or more different peptide sequences.
The distribution of cell location in the blood protein SQLdatabase
| Total | 22926 | 1 |
| Nucleus, | 2958 | 0.12902 |
| Membrane, integral to membrane, | 1330 | 0.05801 |
| Cytoplasm, | 0.03533 | |
| | | |
| Extracellular region, | 624 | 0.02722 |
| Integral to membrane, | 531 | 0.02316 |
| | | |
| Intracellular, | 447 | 0.0195 |
| Nucleus, cytoplasm, | 414 | 0.01806 |
| Intracellular, nucleus, | 403 | 0.01758 |
| Extracellular space, | 363 | 0.01583 |
| Membrane, | 298 | 0.013 |
| Mitochondrion, | 269 | 0.01173 |
| Plasma membrane, integral to membrane, | 265 | 0.01156 |
| Extracellular region, extracellular space, | 264 | 0.01152 |
| Cellular_component, | 203 | 0.00885 |
| Ubiquitin ligase complex, | 200 | 0.00872 |
| Ubiquitin ligase complex, | 191 | 0.00833 |
| Extracellular region, proteinaceous extracellular matrix, | 179 | 0.00799 |
| Nucleus, cytoplasm, | 142 | 0.00619 |
| Nucleus, nucleus, | 131 | 0.00571 |
| Plasma membrane, integral to plasma membrane, | 129 | 0.00563 |
| Integral to plasma membrane, membrane, | 125 | 0.00545 |
| Plasma membrane, integral to plasma membrane, | 103 | 0.00449 |
| Cytoskeleton, | 95 | 0.00414 |
| Proteinaceous extracellular matrix, | 93 | 0.00406 |
| | | |
| | | |
| Endoplasmic reticulum, endoplasmic reticulum membrane, membrane, integral membrane, | 80 | 0.00349 |
| Nucleosome, nucleus, chromosome, | 78 | 0.0034 |
| Intracellular, ribosome, | 74 | 0.00323 |
| Plasma membrane, | 73 | 0.00318 |
| Intracellular, cytoplasm, | 72 | 0.00314 |
| Lysosome, | 72 | 0.00314 |
| Intracellular, nucleus, cytoplasm, | 71 | 0.0031 |
| Actin cytoskeleton, | 70 | 0.00305 |
| Endoplasmic reticulum, | 69 | 0.00301 |
| Cytoplasm, cytoskeleton, | 68 | 0.00297 |
| Plasma membrane, integral to membrane, | 66 | 0.00288 |
| Cytosol, | 65 | 0.00284 |
| Intracellular, nucleus, | 64 | 0.00279 |
| Membrane fraction, integral to plasma membrane, | 64 | 0.00279 |
| Ubiquitin ligase complex, nucleus, | 63 | 0.00275 |
| Membrane fraction, integral to plasma membrane, membrane, | 60 | 0.00262 |
For more complete information see Additional files.
The distribution of molecular functions in the blood protein SQL database
| Total | 24031 | 1 |
| Protein binding, | 1373 | 0.05713 |
| DNA binding, | 348 | 0.01448 |
| Binding, | 340 | 0.01415 |
| Calcium ion binding, | 226 | 0.0094 |
| Transcription factor activity, | 225 | 0.00936 |
| Structural molecule activity, | 196 | 0.00816 |
| Receptor activity, | 193 | 0.00803 |
| Calcium ion binding, protein binding, | 185 | 0.0077 |
| DNA binding, zinc ion binding, | 179 | 0.00745 |
| DNA binding, zinc ion binding, metal ion binding, | 159 | 0.00662 |
| Structural constituent of ribosome, | 149 | 0.0062 |
| Nucleic acid binding, zinc ion binding, | 134 | 0.00558 |
| Protein binding, zinc ion binding, metal ion binding, | 125 | 0.0052 |
| Catalytic activity, | 109 | 0.00454 |
| Nucleic acid binding, | 106 | 0.00441 |
| Calcium ion binding, protein binding, | 104 | 0.00433 |
| GTPase activator activity, | 104 | 0.00433 |
| Extracellular matrix structural constituent, | 101 | 0.0042 |
| RNA binding, | 89 | 0.0037 |
| Ubiquitin-protein ligase activity, zinc ion binding, | 89 | 0.0037 |
| Serine-type endopeptidase inhibitor activity, | 88 | 0.00366 |
| Receptor activity, olfactory receptor activity, | 85 | .00354 |
| Transcription factor activity, zinc ion binding, | 81 | 0.00337 |
| Transporter activity, | 79 | 0.00329 |
| Actin binding, | 76 | 0.00316 |
| Signal transducer activity, | 76 | 0.00316 |
| Nucleotide binding, protein serine/threonine kinase | 75 | 0.00312 |
| Nucleotide binding, ATP binding, | 68 | 0.00283 |
| Nucleotide binding, RNA binding, | 66 | 0.00275 |
| Transcription factor, sequence-specific DNA binding | 63 | 0.00262 |
| Receptor binding, | 62 | 0.00258 |
| Hydrolase activity, | 60 | 0.0025 |
| Nucleotide binding, RNA binding, protein binding, | 59 | 0.00246 |
| Protein binding, zinc ion binding, | 55 | 0.00229 |
| Structural constituent of cytoskeleton, | 51 | 0.00212 |
| Nucleic acid binding, zinc ion binding, metal ion binding, | 50 | 0.00208 |
| Molecular_function, protein binding, | 48 | 0.002 |
| RNA binding, protein binding, | 48 | 0.002 |
| Sugar binding, | 48 | 0.002 |
| Transcription factor activity, RNA polymerase II | 46 | 0.00191 |
| DNA binding, protein binding, zinc ion binding, | 45 | 0.00187 |
| Actin binding, actin binding, actin binding, structural | 44 | 0.00183 |
| Growth factor activity, | 44 | 0.00183 |
For more complete information see Additional files.
The distribution of biological processes in the blood protein SQL database
| Total | 22069 | 1 |
| Transcription, regulation of transcription, DNA-dependent, | 811 | 0.03674 |
| | | |
| Regulation of transcription, DNA-dependent, | 282 | 0.01278 |
| Proteolysis, | 278 | 0.0126 |
| Transport, | 250 | 0.01133 |
| Translation, | 244 | 0.01106 |
| Signal transduction, | 209 | 0.00947 |
| Cell adhesion, | 173 | 0.00784 |
| Metabolic process, | 164 | 0.00743 |
| Protein amino acid phosphorylation, | 153 | 0.00693 |
| Protein ubiquitination, | 139 | 0.0063 |
| Ubiquitin cycle, | 134 | 0.00607 |
| Electron transport, | 122 | 0.00553 |
| Intracellular signaling cascade, | 114 | 0.00517 |
| Protein amino acid dephosphorylation, | 105 | 0.00476 |
| Multicellular organismal development, | 101 | 0.00458 |
| Phosphate transport, | 98 | 0.00444 |
| Protein folding, | 90 | 0.00408 |
| Cell adhesion, homophilic cell adhesion, | 86 | 0.0039 |
| Immune response, | 85 | 0.00385 |
| Microtubule-based movement, | 85 | 0.00385 |
| Signal transduction,G-protein coupled receptor protein | 82 | 0.00372 |
| Protein transport, | 81 | 0.00367 |
| mRNA processing, RNA splicing, | 75 | 0.0034 |
| Nucleosome assembly, | 72 | 0.00326 |
| Epidermis development, | 61 | 0.00276 |
| Cell adhesion, homophilic cell adhesion, | 58 | 0.00263 |
| Small GTPase mediated signal transduction, | 57 | 0.00258 |
| Cation transport, calcium ion transport, | 56 | 0.00254 |
| Cell surface receptor linked signal transduction, | 56 | 0.00254 |
| Cytoskeletal anchoring, | 55 | 0.00249 |
| Signal transduction, G-protein coupled receptor protein factor | 55 | 0.00249 |
| Signal transduction, G-protein coupled receptor protein | 47 | 0.00213 |
| Protein modification, | 46 | 0.00208 |
| Ion transport, | 45 | 0.00204 |
| Nucleosome assembly, chromosome organization | 45 | 0.00204 |
| Muscle contraction, cytoskeletal anchoring, development | 44 | 0.00199 |
| rRNA processing, | 44 | 0.00199 |
| Spermatogenesis, | 44 | 0.00199 |
| Carbohydrate metabolic process, | 43 | 0.00195 |
| Muscle development, | 43 | 0.00195 |
| Acute-phase response, | 41 | 0.00186 |
| Glycolysis, | 41 | 0.00186 |
| Intracellular protein transport, | 41 | 0.00186 |
For further information see Additional files.
Figure 8The contents of the database queried for transcription-associated proteins are shown without filtering. The full list of factors may be found in Additional file 1. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 9DNA remodeling factors in human blood. The contents of the database were queried for DNA remodeling-associated proteins and are shown without filtering. The full list of factors may be found in Additional file 2. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 10RNA binding and zinc finger proteins. The contents of the database were queried for RNA binding or zinc finger-associated proteins and are shown with filtering at n = 3. The full list of factors may be found in Additional file 3. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 11The contents of the database were queried for secretion-associated proteins and are shown without filtering. The full list of protein may be found in Additional file 4. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 12The receptor and signal transduction proteins in human blood serum or plasma. The contents of the database wee queried for receptors, kinases, phosphatase and cell signalling-associated proteins and are shown with filtering at n = 5. The full list of factors may be found in Additional file 5. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 13The cytokine, chemokine and interleukin proteins of human blood plasma or serum. The contents of the database were queried for cytokines, chemokines, interleukins and tumor necrosis factor associated proteins and are shown without filtering. The full list of factors may be found in Additional file 6. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 14The growth factor proteins of human blood plasma or serum. The contents of the database were queried for growth factor associated proteins and are shown without filtering. The full list of factors may be found in Additional file 7. The figure was produced using STRING evidence view. Colors: Green gene neighborhood; red gene fusion; blue concurrence; black co-expression; purple experiments; cyan databases; yellow text mining; and grey homology.
Figure 15Note that exosomes may contain proteins like ligands, receptors, transcription factors or RNA and potentially DNA that may alter the target cells fate, differentiation or functions.