| Literature DB >> 27023524 |
Abstract
Collagen is one of the most ubiquitous proteins in the animal kingdom and the dominant protein in extracellular tissues such as bone, skin and other connective tissues in which it acts primarily as a supporting scaffold. It has been widely investigated scientifically, not only as a biomedical material for regenerative medicine, but also for its role as a food source for both humans and livestock. Due to the long-term stability of collagen, as well as its abundance in bone, it has been proposed as a source of biomarkers for species identification not only for heat- and pressure-rendered animal feed but also in ancient archaeological and palaeontological specimens, typically carried out by peptide mass fingerprinting (PMF) as well as in-depth liquid chromatography (LC)-based tandem mass spectrometric methods. Through the analysis of the three most common domesticates species, cow, sheep, and pig, this research investigates the advantages of each approach over the other, investigating sites of sequence variation with known functional properties of the collagen molecule. Results indicate that the previously identified species biomarkers through PMF analysis are not among the most variable type 1 collagen peptides present in these tissues, the latter of which can be detected by LC-based methods. However, it is clear that the highly repetitive sequence motif of collagen throughout the molecule, combined with the variability of the sites and relative abundance levels of hydroxylation, can result in high scoring false positive peptide matches using these LC-based methods. Additionally, the greater alpha 2(I) chain sequence variation, in comparison to the alpha 1(I) chain, did not appear to be specific to any particular functional properties, implying that intra-chain functional constraints on sequence variation are not as great as inter-chain constraints. However, although some of the most variable peptides were only observed in LC-based methods, until the range of publicly available collagen sequences improves, the simplicity of the PMF approach and suitable range of peptide sequence variation observed makes it the ideal method for initial taxonomic identification prior to further analysis by LC-based methods only when required.Entities:
Keywords: bone collagen; collagen function; hydroxylation; peptide mass fingerprinting; variability
Mesh:
Substances:
Year: 2016 PMID: 27023524 PMCID: PMC4848901 DOI: 10.3390/ijms17040445
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Plots of the number of amino acid substitutions between cattle (Bos), sheep (Ovis) and pig (Sus) collagen tryptic peptides for the α1(I) (top) and α2(I) (bottom) chains. Species biomarkers A–G presented in Buckley et al. [29] are indicated with arrows and labelled accordingly.
Figure 2Matrix Assisted Laser Desorption Ionization Time of Flight (MALDI-ToF) mass spectra of collagen tryptic digests from Bos (top) and Sus (bottom) bone, annotated with peptide labels relating to their position in the α chains. 2t3 is noted as being subject to an additional mass shift due to the change of a proline residue that is predominantly hydroxylated in Bos (“/” indicates missed cleavage site, i.e., the presence of an internal K or R residue; “&” indicates that more than one peptide are observed with a similar m/z value).
Figure 3Sections of the MALDI fingerprints that highlight homologous markers between Bos (top) and Ovis (bottom) tryptic collagen peptides showing: (A) 2t39; (B) 2t34 and 2t85; (C) 2t76 (noting that the Ovis marker is at the same m/z as other collagen peptides); (D) 2t75/76 (* note that the Bos form of 2t75, HGNR, includes an amino acid susceptible to deamidation that could be mistaken as the homologous marker in Ovis and Sus); and (E) 1t55/56 (“/” indicates missed cleavage site, i.e., the presence of an internal K or R residue).
Collagen alpha 1(I) (COL1A1) peptide sequences showing amino acid variations between artiodactyl taxa (hyphen indicates identical amino acid residue as the main sequence = Bos; sequences followed by O = Ovis and S = Sus). (√) indicates observation of precursor in (at least 2 of 3) fingerprints (peptide mass fingerprinting (PMF) data); shaded cells indicate lack of observation in (at least 2 of 3) liquid chromatography (LC)-based results (see Tables S1–S3); single lettering under “Peptide label” indicates PMF species biomarker from Buckley et al. [29].
| Peptide Label * | Sequence | Peptide Label * | Sequence |
|---|---|---|---|
| 1t1 | QLSYGYDEK | 1t47 (√) | GVQGPPGPAGPR |
| 1t2 (√) | STGISVPGPMGPSGPR; -A--------------(S) | 1t48 | GANGAPGNDGAK |
| 1t3 (√) | GLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPR | 1t49 (√) | GDAGAPGAPGSQGAPGLQGMPGER |
| 1t4 | GPPGPPGK | 1t50 | GAAGLPGPK |
| 1t5 | NGDDGEAGK | 1t51 | GDR |
| 1t6 | PGR | 1t52 | GDAGPK |
| 1t7 | PGER | 1t53 | GADGAPGK |
| 1t8 (√) | GPPGPQGAR | 1t54 | DGVR |
| 1t9 | GLPGTAGLPGMK | 1t55 (F) (√) | GLTGPIGPPGPAGAPGDK |
| 1t10 | GHR | 1t56 (F) (√) | GEAGPSGPAGPTGAR; --T------------(O; S) |
| 1t11 | GFSGLDGAK | 1t57 (√) | GAPGDR |
| 1t12 | GDAGPAGPK | 1t58 (√) | GEPGPPGPAGFAGPPGADGQPGAK |
| 1t13 (√) | GEPGSPGENGAPGQMGPR; ----------T-------(O) | 1t59 | GEPGDAGAK |
| 1t14 | GLPGER | 1t60 | GDAGPPGPAGPAGPPGPIGNVGAPGPK; -----------T-------S-------(S) |
| 1t15 (√) | GR | 1t61 | GAR |
| 1t16 (√) | PGAPGPAGAR; --P-------(S) | 1t62 (√) | GSAGPPGATGFPGAAGR |
| 1t17 | GNDGATGAAGPPGPTGPAGPPGFPGAVGAK | 1t63 | VGPPGPSGNAGPPGPPGPAGK |
| 1t18 | GEGGPQGPR; --A------(O); --A----A-(S) | 1t64 | EGSK |
| 1t19 (√) | GSEGPQGVR | 1t65 | GPR |
| 1t20 | GEPGPPGPAGAAGPAGNPGADGQPGAK; -------------------------G-(S) | 1t66 | GETGPAGR |
| 1t21 (√) | GANGAPGIAGAPGFPGAR | 1t67 | PGEVGPPGPPGPAGEK; A---------------(O); ---A------------(S) |
| 1t22 | GPSGPQGPSGPPGPK | 1t68 (√) | GAPGADGPAGAPGTPGPQGIAGQR; -S----------------------(S) |
| 1t23 | GNSGEPGAPGSK | 1t69 (√) | GVVGLPGQR |
| 1t24 | GDTGAK | 1t70 | GER |
| 1t25 (√) | GEPGPTGIQGPPGPAGEEGK; -------V------------(S) | 1t71 | GFPGLPGPSGEPGK |
| 1t26 (√) | R | 1t72 | QGPSGASGER; -----P---- (S) |
| 1t27 | GAR | 1t73 (√) | GPPGPMGPPGLAGPPGESGR |
| 1t28 (√) | GEPGPAGLPGPPGER | 1t74 | EGAPGAEGSPGR |
| 1t29 | GGPGSR | 1t75 | DGSPGAK; --A----(O); --A—-P-(S) |
| 1t30 | GFPGADGVAGPK | 1t76 | GDR |
| 1t31 | GPAGER | 1t77 (√) | GETGPAGPPGAPGAPGAPGPVGPAGK; --S-----------------------(S) |
| 1t32 | GAPGPAGPK; -S-------(S) | 1t78 (√) | SGDR |
| 1t33 | GSPGEAGR | 1t79 (√) | GETGPAGPAGPIGPVGAR; -----------V------(S) |
| 1t34 | PGEAGLPGAK | 1t80 (√) | GPAGPQGPR |
| 1t35 | GLTGSPGSPGPDGK | 1t81 | GDK |
| 1t36 (√) | TGPPGPAGQDGR | 1t82 | GETGEQGDR |
| 1t37 (√) | PGPPGPPGAR | 1t83 | GIK |
| 1t38 (√) | GQAGVMGFPGPK | 1t84 | GHR |
| 1t39 | GAAGEPGK | 1t85 (√) | GFSGLQGPPGPPGSPGEQGPSGASGPAGPR |
| 1t40 | AGER | 1t86 (√) | GPPGSAGSPGK; -------T---(O); -------A---(S) |
| 1t41 | GVPGPPGAVGPAGK | 1t87 (√) | DGLNGLPGPIGPPGPR |
| 1t42 | DGEAGAQGPPGPAGPAGER | 1t88 | GR |
| 1t43 | GEQGPAGSPGFQGLPGPAGPPGEAGK | 1t89 | TGDAGPAGPPGPPGPPGPPGPPSGGYDLSFLPQPPQEK------V------------------F-F----------(S) |
| 1t44 | PGEQGVPGDLGAPGPSGAR | 1t90 | AHDGGR |
| 1t45 | GER | 1t91 | YYR |
| 1t46 | GFPGER | 1t92 | A |
COL1A2 peptide sequences showing amino acid variations between artiodactyl taxa (hyphen indicates identical amino acid residue as the main sequence = Bos; sequences followed by O = Ovis and S = Sus). (√) indicates observation of precursor in (at least 2 of 3) fingerprints (PMF data); shaded cells indicate lack of observation in (at least 2 of 3) LC-based results (see Tables S1–S3); single lettering under “Peptide label” indicates PMF species biomarker from Buckley et al. [29].
| Peptide Label | Sequence | Peptide Label | Sequence |
|---|---|---|---|
| 2t1 | QFDAK; ---G-(O); -Y-G-(S) | 2t45 (C) (√) | GPPGESGAAGPTGPIGSR; -----------A------(S) |
| 2t2 | G-G-GPGPMGLMGPR; -V-A-----------(S) | 2t46 | GPSGPPGPDGNK |
| 2t3 (√) | GPPGASGAPGPQGFQGPPGEPGEPGQTGPAGAR; -----V-----------A---------------(S) | 2t47 (√) | GEPGVVGAPGTAGPSGPSGLPGER-----L------------------(S) |
| 2t4 | GPPGPPGK | 2t48 | GAAGIPGGK |
| 2t5 | AGEDGHPGK | 2t49 | GEK |
| 2t6 | PGR | 2t50 | GETGLR |
| 2t7 | PGER | 2t51 | GDIGSPGR; --V-----(O); --V-----(S) |
| 2t8 | GVVGPQGAR | 2t52 | DGAR |
| 2t9 | GFPGTPGLPGFK | 2t53 | GAPGAIGAPGPAGANGDR; -----V------------(O; S) |
| 2t10 | GIR | 2t54 | GEAGPAGPAGPAGPR |
| 2t11 | GHNGLDGLK | 2t55 | GSPGER |
| 2t12 | GQPGAPGVK | 2t56 (√) | GEVGPAGPNGFAGPAGAAGQPGAK |
| 2t13 | GEPGAPGENGTPGQTGAR | 2t57 | GER |
| 2t14 | GLPGER | 2t58 | GTK |
| 2t15 | GR | 2t59 | GPK |
| 2t16 | VGAPGPAGAR | 2t60 (√) | GENGPVGPTGPVGAAGPSGPNGPPGPAGSR-----------------A------------(S) |
| 2t17 | GSDGSVGPVGPAGPIGSAGPPGFPGAPGPK; -N----------------------------(S) | 2t61 (√) | GDGGPPGATGFPGAAGR |
| 2t18 (√) | GELGPVGNPGPAGPAGPR | 2t62 | TGPPGPSGISGPPGPPGPAGK; ------A--------------(O); I--------------------(S) |
| 2t19 | GEVGLPGLSGPVGPPGNPGANGLPGAK-------V-------------------(S) | 2t63 | EGLR |
| 2t20 (√) | GAAGLPGVAGAPGLPGPR | 2t64 | GPR |
| 2t21 (√) | GIPGPVGAAGATGAR; -----A---------(S) | 2t65 | GDQGPVGR |
| 2t22 | GLVGEPGPAGSK | 2t66 | SGETGASGPPGFVGEK; T--P--A---------(O); T-----------A---(S) |
| 2t23 | GESGNK | 2t67 (G) (√) | GPSGEPGTAGPPGTPGPQGLLGAPGFLGLPGSR |
| 2t24 | GEPGAVGQPGPPGPSGEEGK; -----A-PQ-----------(S) | 2t68 | GER |
| 2t25 (√) | R | 2t69 (D) (√) | GLPGVAGSVGEPGPLGIAGPPGAR |
| 2t26 (√) | GSTGEIGPAGPPGPPGLR; -PN--V--S----------(S) | 2t70 | GPPGNVGNPGVNGAPGEAGR; ----A---------------(S) |
| 2t27 | GNPGSR | 2t71 | DGNPGNDGPPGR; -----S------(S) |
| 2t28 | GLPGADGR | 2t72 | DGQPGHK; ---A---(S) |
| 2t29 | AGV | 2t73 | GER |
| 2t30 (√) | GATGPAGVR; -P-------(S) | 2t74 | GYPGNAGPVGAAGAPGPQGPVGPVGK; -----------------------T--(O); -----P—A-----------A---A--(S) |
| 2t31 | GPNGDSGR | 2t75 (√) | HGNR; --S-(O) |
| 2t32 | PGEPGLMGPR | 2t76 (√) | GEPGPAGAVGPAGAVGPR; -----V------------(O); -------S----------(S) |
| 2t33 | GFPGSPGNIGPAGK | 2t77 | GPSGPQGIR |
| 2t34 (√) | EGPVGLPGIDGR; ---A--------(O;S) | 2t78 | GDK |
| 2t35 (√) | PGPIGPAGAR | 2t79 | GEPGDK |
| 2t36 (√) | GEPGNIGFPGPK | 2t80 | GPR |
| 2t37 | GPSGDPGK; --T-----(O;S) | 2t81 | GLPGLK |
| 2t38 | AGEK; N---(S) | 2t82 (√) | GHNGLQGLPGLAGHHGDQGAPGAVGPAGPR; ----------------------P-------(S) |
| 2t39 (√) | GHAGLAGAR; -------P-(O) | 2t83 | GPAGPSGPAGK; -----T-----(O) |
| 2t40 | GAPGPDGNNGAQGPPGLQGVQGGK; ----------------P-------(S) | 2t84 | DGR |
| 2t41 (E) (√) | GEQGPAGPPGFQGLPGPAGTAGEAGK; -----------------------V--(S) | 2t85 (A) (√) | IGQPGAVGPAGIR; T------------(O; S) |
| 2t42 (E) (√) | PGER | 2t86 | GSQGSQGPAGPPGPPGPPGPPGPSGGGYEFGFDGDFYR; ----------------------------D---------(O)----------------------------D--YE-----(S) |
| 2t43 (B) (√) | GLPGEFGLPGPAGAR; -------------P-(S) | ||
| 2t44 | GER | 2t87 | A |
Mascot ion scores for the nine unique peptides observed in all taxa (m.c. indicates only observed through missed cleavage site to the exception of * also included for similarity to homologous peptides).
| Peptide Label | |||
|---|---|---|---|
| 1t18 | 41 (m.c.) | 35 | 47 |
| 1t67 | 45 | 88 | 48 |
| 1t75 | 46 (m.c.) | 55 (m.c.) | 5/14 (m.c.) * |
| 1t86 | 56 | 38 | 62 |
| 2t1 | 43 (m.c.) | 56 (m.c.) | 69 (m.c.) |
| 2t51 | 35 (m.c.) | 25 | 35 |
| 2t62 | 50 | 37 | 56 |
| 2t66 | 81 | 59 | 80 |
| 2t74 | 80 | 48 | 77 |
Figure 4Tandem mass spectra of the two highest scoring α1(I) chain peptides that are unique to each taxa within this study showing peptides: (A) 1t67; and (B) 1t86.
Figure 5Tandem mass spectra of the two highest scoring α2(I) chain peptides that are unique to each taxa within this study showing peptides: (A) 2t66; and (B) 2t74.
Figure 6Example tandem mass spectra taken from Mascot output showing matches (numbers coloured red) to the same collagen peptide (2t69) but with (A) a variable hydroxylation matched on the 12th residue, compared with (B) a variable hydroxylation on the 14th residue (the peptide significance score for this (Bos) search was 40 and the highest false positive ion score as 31; the false discovery rate above identity threshold was 2.27%).