Literature DB >> 35634226

Identification of collagen 1α3 in teleost fish species and typical collision induced internal fragmentations.

Anne J Kleinnijenhuis1, Frédérique L van Holthoon1, Bastiaan van der Steen2.   

Abstract

In contrast to collagens 1α1 and 1α2, the more obscure collagen 1α3 is sparsely mentioned in literature. In skin collagen type 1 of teleosts (bony fish), however, the chain occurs in a heterotrimer together with collagens 1α1 and 1α2, which makes it one of the most abundant proteins in teleosts. As teleost fish species and gelatin (hydrolysate) prepared from their skin are a major source for food products and nutraceuticals, the goal of the study was to selectively identify collagen 1α3 in several fish species. Fish skin extracts and fish skin gelatins were analyzed using LC-MS. Depending on the amount of available genetic information different approaches were used to identify collagen 1α3. Additionally, collagen-specific collision induced internal fragmentations are discussed, which are important to consider during data analysis. Ultimately the presence of collagen 1α3 could be confirmed using LC-MS in multiple fish species.
© 2022 The Author(s).

Entities:  

Keywords:  Collagen 1α3; Fish; Gelatin; Internal fragmentation; LC-MS; Teleost

Year:  2022        PMID: 35634226      PMCID: PMC9130073          DOI: 10.1016/j.fochx.2022.100333

Source DB:  PubMed          Journal:  Food Chem X        ISSN: 2590-1575


Introduction

Collagens are basic structural proteins (Karsdal, Leeming, Henriksen, & Bay-Jensen, 2016) in both vertebrates and invertebrates (Shahidi, 2007). They provide mechanical stability, strength and toughness to several tissues such as tendon, skin and bone (Fratzl, 2008). The protein family has a long history, originating from before the Cambrian period. Collagen is closely involved with the appearance and evolution of metazoa (Exposito, Cluzel, Garrone, & Lethias, 2002): all animals that undergo development from an embryonic stage with three tissue layers (ectoderm, mesoderm and endoderm) (Technau, & Scholz, 2003). Amino acid sequence comparisons of various collagens indicate that the main types of collagen evolved about 800–900 million years ago (Runnegar, 1985). Choanoflagellate and diploblast genomic data have suggested that the formation of an ancestral α chain occurred before the metazoan radiation (King et al., 2008, Zhang et al., 2007). Phylogenetic studies point to an early emergence of the three fibrillar collagen clades A, B and C before the eumetazoan radiation (Exposito, Valcourt, Cluzel, & Lethias, 2010). While clade B comprises collagen types 5α1, 5α3, 11α1 and 11α2 and clade C collagen types 24 and 27, the clade A collagens include types 1 to 3 and 5α2 (Exposito et al., 2008, Boot-Handford et al., 2003, Sicot et al., 1997). The classification and major characteristics of fibrillar collagens are summarized in Table 1. Clade A collagen type 1 is by far the most abundant protein in vertebrates (Makareeva, & Leikin, 2014) and forms type 1 triple helices (Shoulders, & Raines, 2009). Being formed from procollagen, the triple helices usually consist of heterotrimers of one collagen 1α2 and two collagen 1α1 chains (Bellamy, & Bornstein, 1971); therefore, the latter chains are relatively well known. The more obscure collagen 1α3 chain, which is the main subject of this paper, occurs only in teleosts (bony fish). Skin type 1 collagens of several teleosts contain 1α3 in a heterotrimer, together with 1α1 and 1α2, as shown for Alaska pollack (Kimura and Ohno, 1987, Piez, 1965) and common mackerel (Kimura, 1985). From a phylogenetic point of view, it seems likely that the collagen 1α3 gene emerged around the time of the adaptive radiation of bony fish (Kimura, Ohno, Miyauchi, & Uchida, 1987). This is the reason why collagen 1α3 does not occur in other fish species, amphibians, reptiles, birds or mammals. Results obtained in zebrafish support the hypothesis that the 1α3 chain arose from a duplication of the 1α1 gene (Morvan-Dubois, Le Guellec, Garrone, Zylberberg, & Bonnaud, 2003) and the original and duplicate genes have diverged ever since. To distinguish and assign collagen 1α1 and 1α3 protein sequences in a fish species, ideally complete genetic information is available of these proteins. When this information has not or only partly been uncovered for a species, it is difficult to link elucidated sequences to either collagen 1α1 or 1α3, solely based on LC-MS/MS (Liquid Chromatography - tandem mass spectrometry) data. In these cases, it can be helpful to search the LC-MS/MS data against species closely related to the analyzed species. Another challenging aspect is to find reliable collagen 1α3 database information at all, as often a different nomenclature is used, namely collagen 1α1b. In the NCBI nucleotide database few sequences can be found which have collagen 1α3 in the accession title, but several more which have collagen 1α1b in the title. During our search for collagen 1α3 sequences it appeared that collagen 1α3 is also described as “collagen 1α1-like” in some occasions. Finally, another consideration is the polyploidy that several fish species exhibit (Leggatt, & Iwama, 2003) which may complicate the naming of proteins, although it would not hinder the identification of collagen 1α3 chains in fish species from a data analytical point of view.
Table 1

Fibril-forming collagen types 1, 2, 3, 5, 11, 24 and 27 with their corresponding clades and typical presence in important tissues. A larger overview of collagen types and characteristics has been extensively described elsewhere (Duconseille, Astruc, Quintana, Meersman, & Sante-Lhoutellier, 2015).

TypeCladeTypical presence in
1ABone, skin, tendon
2ACartilage
3ASkin
5A (5α2), B (5α1, 5α3)Bone, skin
11BCartilage
24CBone
27CCartilage
Fibril-forming collagen types 1, 2, 3, 5, 11, 24 and 27 with their corresponding clades and typical presence in important tissues. A larger overview of collagen types and characteristics has been extensively described elsewhere (Duconseille, Astruc, Quintana, Meersman, & Sante-Lhoutellier, 2015). Collagen 1α3/1α1b has been detected in proteomic and other studies (Keen et al., 2018, Carlson et al., 2013, Koth et al., 2020), but it is rarely mentioned in literature. Considering its very high abundance in teleosts, having the same stoichiometry as the 1α1 and 1α2 chains in skin, it is an interesting protein for further investigation. Collagen type 1 is a basic constituent of food and an ingredient to produce several food and nutraceutical products, such as gelatin and collagen hydrolysate. Commercial gelatins are mainly produced from bovine and porcine skins and bones, and consequently these gelatins are mainly composed of partly hydrolyzed type 1 collagen (Stevens, 2010, Vergauwen et al., 2016, Yasui et al., 1984, Kleinnijenhuis et al., 2018). Another source to produce gelatin is the skin of several fish species (Vergauwen et al., 2016). Gelatin is typically further chemically and/or enzymatically processed to produce collagen hydrolysate nutraceutical products. When it is considered that teleost fish species are a major food source and that several food and nutraceutical products are prepared from their skins, it can be stated that collagen 1α3 is an important protein worth further investigation. The source protein composition of many mammalian collagen (hydrolysate) products is quite well known as they contain collagen 1α1 and 1α2 in a 2:1 molar ratio. In addition, it is often known which other collagen types might be present in raw material tissues, such as collagen 2α1 (abundant in cartilage) or collagen 3α1 (abundant in skin). To shed more light on the composition of fish gelatin (hydrolysates) we decided to selectively investigate the presence of collagen 1α3 in skin extracts and skin gelatin samples of several fish species, using ultra high performance LC-MS/MS (UHPLC-MS/MS) with an Orbitrap analyzer. It is essential to use a high-resolution mass analyzer during collagen MS/MS sequence analysis, to discriminate between hydroxyproline and leucine/isoleucine residues, which differ only 0.036 Da in mass. Fish species or families can often be readily assigned using elucidated collagen 1α1 and 1α2 sequences, but the more challenging goal of this study was to identify collagen 1α3. The examined fish species were categorized based on the amount of available genetic information, and the approach for finding collagen 1α3 was adapted to this aspect. After having obtained a global overview of the similarities and relations between clade A fibrillar collagens, the LC-MS investigation of fish skin collagens is discussed, starting with species for which complete genetic collagen 1α3 information was available to species with hardly any information. Additionally, collision induced dissociation (CID) of tryptic collagen peptides is discussed, featuring collagen-specific typical internal fragmentations. Ultimately the presence of collagen 1α3 could be confirmed using LC-MS in multiple fish species.

Materials and methods

Gelatin samples and collagen extraction

Type A (acid pre-treated) fish skin gelatin samples (pangasius, tilapia, cod and salmon) were kindly provided by Rousselot. Other fish (barramundi, sea bream, trout, hake, cod, saithe/pollack, mackerel, rose fish, haddock, salmon, sea bass and sardine) were purchased at local supermarkets. Collagen from these fish was extracted from the skins based on the procedure described by Gudmundsson and Hafsteinsson (Gudmundsson, & Hafsteinsson, 1997) and Koli et al. (Koli, Basu, Gudipati, Chouksey, & Nayak, 2013). In short, skins (approximately 1 g) were cleaned and rinsed with water to remove excess material and treated for 40 min respectively with 0.2% (w/v) sodium hydroxide, 0.2% (w/v) sulfuric acid and 1.0 % (w/v) citric acid solutions. After each treatment, the skins were washed under running tap water until the pH was between 6.5 and 7.5. The soaking and washing treatments were repeated three times. Finally, collagen was extracted from skin in 7 ml milliQ water at 50 °C for 18 h. After solubilization, reduction, alkylation and tryptic digestion the samples were analyzed using UHPLC-MS/MS.

LC-MS analysis

Samples were analyzed using a combination of a UHPLC (Ultimate 3000, Dionex) and a Q-Exactive mass spectrometer (ThermoElectron). An Acquity HSS T3 column (2.1 × 100 mm, 1.8 µm, Waters, Milford, PA, USA) was used at a temperature of 40 °C. Elution was achieved using a binary gradient from 2% to 30% B at a flow rate of 0.5 ml minute−1 with solvents A and B, both containing 0.1% formic acid in respectively milliQ water and acetonitrile, followed by a column wash and equilibration as part of the gradient, to prepare for the next run. The autosampler temperature was 40 °C and the injection volume was 10 μl. The total run time was 18 min. All peptides were analyzed using electrospray ionization in positive mode (HESI source) using a full-scan data-dependent method. The mass range was set to m/z 200–2000 at a resolution of 35,000. The top 5 ions were submitted to data-dependent scans at a normalized collision energy of 15, 25 and 35. The spray voltage was 3.0 kV. Other settings were: sheath gas flow (60 AU), auxiliary gas flow (20 AU), capillary temperature (320 °C), heater temperature (350 °C), S-Lens RF Level (50 V), AGC target (1e6), and maximum IT (150 ms). XCalibur software version 3 (ThermoScientific) was used for data acquisition.

Data analysis

Database analysis of the raw files was performed in Proteome Discoverer 1.4 (Thermo) and included manual inspection/confirmation. Sequences of 48 collagen 1α3 or 1α1b mRNA or cDNA entries were obtained from the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore/advanced), accessed in April-June 2021 (see Supplementary Table 1). The sequences were translated to protein to compose a fasta file. Collagen 1α3/1α1b sequences were only included in the fasta file if the GXY domain was 1014 codons long, to promote the inclusion of sequences with high quality (Kleinnijenhuis, 2019). An overview of other collagen data sources is provided in Supplementary Table 2. From the Clupea harengus sequence XP_031416510.1 one unidentified amino acid was deleted as it disrupted the GXY pattern. The relatedness of sequences was assessed using Clustal Omega analysis (Sievers et al., 2011). Database searching parameters included full tryptic digestion and allowed up to one missed cleavage; the precursor mass tolerance was set at 10 ppm, and fragment mass tolerance at 0.02 Da. Carbamidomethylation (C) was set as a fixed modification, although cysteine is not expected to occur in collagen. Oxidation (M, P and K) were set as variable modifications. False discovery rate on the peptide level was less than 1%. Only peptides labeled ‘high confidence’ were considered (q-value < 0.01).

Results and discussion

To visualize the similarities and relations between fibrillar collagens of clade A, the number of nucleotide differences between the GXY domain cDNA sequences was calculated for collagens 1–3 of cow, pig, chicken, coelacanth and tilapia. The numbers of mutual nucleotide differences were set out in a matrix and collagen 1α3/1α1b sequences of several fish species were added, see Fig. 1. The accessions of the compared species are mentioned in Supplementary Table 1 and Supplementary Table 2. Collagen 5α2 was excluded from this comparison due to its low similarity to collagens 1–3 (results not shown). Not every collagen type is represented for the selected animal species in Fig. 1, because specific collagen types do not occur in an animal species or due to absent information. It should be noted that the matrix, showing similarities on the nucleotide level, does not linearly translate to similarity on the amino acid level or to evolutionary divergence time and rate (Kleinnijenhuis, 2019). Analysis of Fig. 1 resulted in the following findings:
Fig. 1

Distance table of 32 compared animal species and collagen types, regarding the collagen GXY domain, indicating, from green via yellow to red, the increasing amount of nucleotide differences.

The relations between fish 1α3 and 1α1b sequences versus the relations with the collagen 1α1 sequences indicate that the chain names 1α3 and 1α1b are indeed synonyms. The same conclusion was drawn from a similar extended comparison of all the accessions mentioned in Supplementary Table 1 (results not shown) and a Clustal Omega analysis of all the sequences (translated to protein sequences) from Supplementary Table 1 and part of the sequences in Supplementary Table 2, see Supplementary Fig. 1. In the remainder of the document the chain of interest will be denoted as collagen 1α3. Additionally, the Clustal Omega analysis revealed that the selected collagen 1α1 sequences formed an individual cluster within the set of investigated sequences. The similarities between the collagen 1α3 and 1α1 sequences indicate that collagen 1α3 did not emerge long before teleost radiation, confirming earlier findings (Kimura, Ohno, Miyauchi, & Uchida, 1987). Therefore, it is difficult to discriminate between collagen 1α1 and 1α3 in fish species when there is limited genetic or protein sequence information available. On the cDNA level, collagen 1α3 and 1α1 are most similar to collagen 2α1, followed by 1α2 and 3α1. The added cod (Gadus morhua) collagen 1α1-like sequence XM_030339720.1, which will be discussed in more detail below, is more similar to the collagen 1α3 sequences than to the collagen 1α1 sequences. It is probable that this cod collagen 1α1-like sequence thus represents collagen 1α3, which makes it reasonable to explore the feasibility of using collagen 1α1-like sequences in the search for collagen 1α3. Distance table of 32 compared animal species and collagen types, regarding the collagen GXY domain, indicating, from green via yellow to red, the increasing amount of nucleotide differences. Due to finding b) it was decided to sort the analyzed fish species in categories I to IV. The categorization was based on the available collagen 1α3 sequence information: I) sequence information available of the species (tilapia), II) sequence information available of family member (trout, sea bream, rose fish, salmon, sea bass, pangasius), III) sequence information available of order member (barramundi, mackerel) and IV) no sequence information available of order member (hake, cod, saithe, haddock, sardine). Per category different approaches were used to identify collagen 1α3. To assign collagen 1α3, software-based data analysis was applied, followed by manual confirmation. Due to the collagen-specific internal fragment ion series that are typically observed, software-based data analysis results must be carefully assessed. Additionally, wrong assignment of the location of hydroxyproline (Hyp or p) residues is often observed. The potential of incomplete sequence information in MS/MS and wrongly assigned modification sites in relation to permutations and isomeric combinations, together with erroneous assignment of internal fragment ions as fragment ions containing a terminus, can be problematic. These issues especially apply to collagens as their amino acid sequences contain many slightly different repetitions (Kleinnijenhuis et al., 2020). Besides extensive manual inspection of MS/MS data, it is an option to improve proteomic algorithms or models (Yang et al., 2021), but the specific collagen properties and behavior in LC-MS/MS, as outlined above, should be taken into account to reduce the risk of false-positive identifications. The only category I fish species (sequence information available of the species) was tilapia (Oreochromis niloticus). The aligned GXY domain amino acid sequences of tilapia collagen 1α1, 1α2, 1α3 and 2α1 are shown in Fig. 2. The highlighted trypsin cleavage sites exemplify the similarity between the chains. It was decided to focus the analysis on a part of the chains, which has highly conserved (preceding) trypsin cleavage sites, also considering their location in other animal species and collagen types mentioned in Supplementary Table 1 and Supplementary Table 2. Between the (preceding) trypsin cleavage sites, however, this part of the chain does exhibit amino acid changes between collagen types of the same species, which is important to discriminate collagen 1α3 and 1α1. The part of the sequence is highlighted in bold in Fig. 2 and represents residues 220–237 of the GXY domain. A combined extracted ion chromatogram of the analogous peptides from tilapia collagen 1α1, 1α2 and 1α3 is presented in Fig. 3. Although the chromatogram shows qualitative results, the analogous peptides are expected to have a relatively similar response factor as their sequences are similar, and therefore the MS results indicate the expected 1:1:1 stoichiometry of the three chains in tilapia skin. The sequences of the tilapia peptides were confirmed by MS/MS, see Fig. 4, which also illustrates the collagen-specific internal fragmentations. Like proline (Breci, Tabb, Yates, & Wysocki, 2003), hydroxyproline exhibits intense collision induced backbone fragmentations on the N-terminal side of the residue. A majority of the collagen trypsin cleavage sites is located C-terminally of the third GXY position and therefore many tryptic collagen peptides have an N-terminal glycine residue. Hyp residues are very often located in the third GXY position. As a result, a large part of the tryptic collagen peptides has Hyp in the third position from the N-terminus, which is also the case for the tilapia collagen 1α1 and 1α3 peptides shown in Fig. 4. In the remainder of the document we designate these peptides as Hyp-3 peptides. When Hyp-3 peptides with n residues are fragmented using CID an internal y(n-2)/b fragment ion series is produced starting with the pG+ ions at m/z 171.076 (Kleinnijenhuis, Van Holthoon, & Herregods, 2018). For the tilapia collagen 1α1 peptide the series extends to pGAAGVAGA+ and for the tilapia collagen 1α3 peptide to pGAAGIAGA+. Regularly, y(n-2)/a-type internal fragments, which are 27.995 Da (–CO) lower in mass, can be observed as well, but usually with lower intensity than the y(n-2)/b internal fragments. Finally, an extensive y ion series is typically present up to y(n-2). The CID spectrum of the analogous tilapia collagen 1α2 peptide, which is not a Hyp-3 peptide, does not show the typical Hyp-3 related internal fragment ions, but relatively more b type ions, which contain the N-terminus. The spectrum does contain internal fragment ions starting with pG+, but they originate from the Hyp residue at the sixth position. Internal fragment ion series for Hyp-6 or Hyp-9 peptides are observed regularly, but with lower intensity than for Hyp-3 peptides, which indicates that especially the loss of the two N-terminal amino acids is favorable when Hyp is in the third position. Overall, it was concluded that tryptic collagen peptide CID fragmentation is strongly influenced by the presence of Hyp, especially when the modified amino acid is in the third position. Collagen 1α3 was clearly identified in tilapia. The elucidated tilapia peptide sequences did not provide hits with other tilapia proteins when searched against taxid (taxonomic identifier) 8128 (Oreochromis niloticus) using protein blast, which further supports the assignments. All the collagen 1α3 identifications are summarized in Table 2; it should be specifically mentioned that often the commercial names of fish, sold as food or used as a basis for food products, do not directly relate to single species names as different species are sold under a common name (e.g. salmon), or could even represent mixtures of species (e.g. sardine). The peptide sequences in Table 2 were confirmed by assessing (low and high intensity) b and y ions as well as Hyp-3 peptide internal fragments including pG+.
Fig. 2

Aligned GXY domain sequences of tilapia (Oreochromis niloticus) collagen 1α1, 1α2, 1α3 and 2α1. Trypsin cleavage sites are in blue (K) and green (R). Proline residues C-terminal of K and R, resulting in tryptic miscleavage, are in orange. The MS/MS spectra of the peptides in bold (at position 220–237) are shown in Fig. 4.

Fig. 3

Combined extracted chromatogram of analogous peptides from tilapia collagen 1α1 (GApGAAGVAGApGFpGPR), collagen 1α2 (GAAGTpGVAGApGFpGPR) and collagen 1α3 (GApGAAGIAGApGFpGAR), present in tilapia skin gelatin.

Fig. 4

MS/MS spectra of tryptic tilapia collagen 1 peptides from tilapia skin gelatin; a) 1α1, b) 1α2 and c) 1α3.

Table 2

Manually confirmed collagen 1α3 peptide identifications. In the third column the protein blast results are summarized.

Animal species (category)Elucidated peptideSearched against, description 100% hits
Barramundi (III)GPAGAQGGVGApGPKPerciformes (taxid 8111), no hits.
Sea bream (II)EGSQGHDGApGRPerciformes (taxid 8111), no hits. Hits with collagen 1α1b of Acanthopagrus latus and with collagen 1α1-like and 2α1-like of Sparus aurata when searched against Sparidae (taxid 8169).
Trout (II)GSTGAAGISGApGFpGTRSalmoniformes (taxid 8006), hit with collagen 1α1(b)(-like) of several trout and salmon species. Hit with collagen 1α1b of Oncorhynchus mykiss (taxid 8022), but not with collagen 1α1 of Oncorhynchus mykiss.
Sardine (IV)GATGSpGIAGApGFpGPRClupeiformes (taxid 32446), hit with collagen 1α1-like of Clupea harengus (XP_031416510.1).
Hake (IV)EGSTGHDGApGRGadiformes (taxid 8043), hit with collagen 1α1-like of Gadus morhua (XP_030195580.1 (same as translated sequence XM_030339720.1)).
Cod (IV)GPAGAQGGLGApGPKGadiformes (taxid 8043), hit with collagen 1α1-like of Gadus morhua (XP_030195580.1).
Saithe (IV)GPAGAQGGLGApGPKSee cod.
Mackerel (III)GGAGPpGATGFpGPAGRPerciformes (taxid 8111), hits with collagen 1α1b of Sebastes umbrosus, Epinephelus lanceolatus and Cyclopterus lumpus.
Rose fish (II)GGpGSSGIAGApGFpGSRScorpaeniformes (taxid 8111), hit with collagen 1α1b of Sebastes umbrosus.
Haddock (IV)GVTGSpGSpGPDGKGadiformes (taxid 8043), hit with collagen 1α1-like of Gadus morhua (XP_030195580.1).
Salmon (II)GSTGAAGISGApGFpGTRSee trout.
Sea bass (II)GNNGDHGApGPKPerciformes (taxid 8111), no hits. Hits with collagen1α3 of Dicentrarchus labrax and collagen 1α1b of Morone saxatilis when searched against Moronidae (taxid 42148).
Pangasius (II)GSpGPAGITGApGFpGTRSiluriformes (taxid 7995), hit with collagen 1α1b of Pangasianodon hypophthalmus.
Tilapia (I)GApGAAGIAGApGFpGARCichliformes (taxid 1489911), hits with collagen 1α1b of Neolamprologus brichardi, Oreochromis aureus, Oreochromis niloticus and with collagen 1α1-like of Archocentrus centrarchus.
Aligned GXY domain sequences of tilapia (Oreochromis niloticus) collagen 1α1, 1α2, 1α3 and 2α1. Trypsin cleavage sites are in blue (K) and green (R). Proline residues C-terminal of K and R, resulting in tryptic miscleavage, are in orange. The MS/MS spectra of the peptides in bold (at position 220–237) are shown in Fig. 4. Combined extracted chromatogram of analogous peptides from tilapia collagen 1α1 (GApGAAGVAGApGFpGPR), collagen 1α2 (GAAGTpGVAGApGFpGPR) and collagen 1α3 (GApGAAGIAGApGFpGAR), present in tilapia skin gelatin. MS/MS spectra of tryptic tilapia collagen 1 peptides from tilapia skin gelatin; a) 1α1, b) 1α2 and c) 1α3. Manually confirmed collagen 1α3 peptide identifications. In the third column the protein blast results are summarized. For several other analyzed fish species, it was not possible to find the tryptic collagen 1α3 peptide containing residues 220–237 of the GXY domain. In these cases it was decided to resort to a different peptide, under the condition that the analogous tilapia collagen 1α3 and 1α1 peptides had the same (preceding) trypsin cleavage locations and that there were at least two mutual amino acid differences between the analogous tilapia collagen 1α3 and 1α1 peptides. It was assumed that when peptides were matched exactly to a collagen 1α3 sequence using these criteria, this indicated the presence of collagen 1α3 with high probability. Finally, elucidated peptide sequences of analyzed fish species were searched against the relevant species, family and/or order using protein blast, see Table 2. Of course, the number of hits after this type of search depends heavily on the proteome content of a database. The category II species (sequence information available of family member) were trout, sea bream, rose fish, salmon, sea bass and pangasius. Especially for category II species the search of MS/MS data against the composed collagen 1α3 fasta file was helpful. As the fasta file contained information of family members of category II species, the search provided many hits with high confidence, which were used as a basis for confirmation of collagen 1α3 peptides. Collagen 1α3 could be identified in all the category II species and the peptide structures which were confirmed, are summarized in Table 2. Protein blast did not return relevant hits with other proteins (of the same species). Only for sea bream a collagen 2α1-like hit was obtained with Sparus aurata, see Table 2, which appears to be collagen 1α1-like as the gene names in the entries of the 1α1-like and 2α1-like hits were the same. Imprecise protein annotations are a complicating factor during data analysis. In Supplementary Fig. 2, MS/MS spectra are presented for two of the category II species, pangasius and salmon, of tryptic peptides also originating from position 220–237 of the GXY domain, confirming the presence of collagen 1α3 in the skin of these fish species. The pangasius peptide is a Hyp-3 peptide, again showing the typical internal fragment ion series, whilst the salmon peptide does not clearly show the typical fragmentation as it is not a Hyp-3 peptide. The elucidated pangasius peptide sequence did not provide hits with other proteins when searched against taxid 7999 (Pangasiidae) using protein blast. The salmon peptide gave hits with collagen 1α1b and collagen 1α1-like of several salmon and trout species when searched against taxid 8015 (Salmonidae). In addition hits were obtained with collagen 1α1 of Salvelinus alpinus (XP_023849227.1) and Oncorhynchus kisutch (XP_020316038.1), which is peculiar. The latter protein sequences were subjected to Clustal Omega analysis, together with a control Oncorhynchus mykiss collagen 1α1 sequence (NM_001124177.1) in addition to the set reported in Supplementary Fig. 1. The Salvelinus alpinus (XP_023849227.1) and Oncorhynchus kisutch (XP_020316038.1) sequences, annotated as collagen 1α1, clustered with collagen 1α3 sequences, while the control Oncorhynchus mykiss collagen 1α1 sequence (NM_001124177.1) clustered with other collagen 1α1 sequences (results not shown). This finding is another indication that collagen annotations are not always precise. Due to the obtained collagen 1α3 hits for the salmon peptide GSTGAAGISGApGFpGTR with several Salmonidae, a different target should be used to distinguish salmon from related trout species. Overall, it is sufficient for the confirmation of the presence of collagen 1α3. Again, the goal of this study was not to show how to discriminate (closely related) species, but to identify collagen 1α3 in fish species. When selectively investigating the presence of salmon and related species, it is advisable to select a less conserved target. The category III species (sequence information available of order member) were barramundi and mackerel. For these species it was harder to identify collagen 1α3. This is illustrated in Table 3, which summarizes the highest sequence coverages per analyzed fish sample and the corresponding species in the fasta file. For the analyzed species without a family member in the fasta file there were few peptide hits with high confidence. Fortunately, the barramundi and mackerel data gave suitable hits with other, less related species and therefore collagen 1α3 could be identified and confirmed, see Table 2. From the barramundi and mackerel results it was deduced that sequence information of order members is substantially less suitable than information of family members, which is not surprising due to the potentially long divergence times between order members.
Table 3

Summary of the highest sequence coverages obtained for the analyzed fish samples, searched against the collagen 1α3 fasta file (see Supplementary Table 1).

Animal species (product)OrderFamilyHighest sequence coverageSpecies with highest sequence coverage
BarramundiPerciformesLatidae33.0Toxotes jaculatrix
Sea breamPerciformesSparidae47.8Acanthopagrus latus
TroutSalmoniformesSalmonidae59.5Oncorhynchus mykiss
SardineClupeiformesClupeidae11.2Carassius auratus
HakeGadiformesGadidae14.6Amphiprion ocellaris
CodGadiformesGadidae15.9Epinephelus lanceolatus
SaitheGadiformesGadidae15.6Nothobranchius kuhntae 2
MackerelPerciformesScombridae17.7Micropterus salmoides
Rose fishScorpaeniformesSebastidae47.5Sebastes umbrosus
HaddockGadiformesGadidae12.7Nothobranchius kuhntae 2
SalmonSalmoniformesSalmonidae42.2Oncorhynchus mykiss
Sea bassPerciformesMoronidae48.2Morone saxatilis
Pangasius (gelatin)SiluriformesPangasiidae62.8Pangasianodon hypophthalmus
Tilapia (gelatin)CichliformesCichlidae61.2Oreochromis niloticus
Cod (gelatin)GadiformesGadidae13.8Nothobranchius kuhntae 2
Salmon (gelatin)SalmoniformesSalmonidae36.7Salvelinus namaycush
Summary of the highest sequence coverages obtained for the analyzed fish samples, searched against the collagen 1α3 fasta file (see Supplementary Table 1). The category IV species (no sequence information available of order member) were hake, cod, saithe, haddock and sardine. Considering the relatively low number of suitable hits even for the category III species, it could be expected that the category IV species would not give any suitable hits which is illustrated in Table 3. The low sequence coverages usually consist of relatively short peptides, which are therefore not unique and occur with the same sequence across species and even across collagen types. Although initially only sequences with 1α3 and 1α1b in the accession title were considered, it was decided to explore the use of collagen 1α1-like sequences for the category IV species. To facilitate identification for the Gadiformes hake, cod, saithe and haddock two different Gadus morhua collagen 1α1-like sequences were used: XM_030339720.1 (mRNA sequence) and XP_030196341.1 (only protein). Clustal Omega analysis indicated that XM_030339720.1 clusters with collagen 1α3 and XP_030196341.1 with collagen 1α1, see Supplementary Fig. 1. Collagen 1α1-like is therefore not a synonym of collagen 1α3 and 1α1b. In Supplementary Fig. 3, an MS/MS spectrum is presented from a collagen 1α1-like peptide (from accession XM_030339720.1) occurring in the cod gelatin sample. The identified peptide GPAGAQGGLGApGPK differs from the analogous part in XP_030196341.1 by 4 amino acids. The elucidated cod peptide sequence only provided a hit with (translated) XM_030339720.1 when searched against taxid 8043 (Gadiformes) using protein blast. Therefore, it is probable that collagen 1α3 was identified in cod. The peptide GPAGAQGGLGApGPK was also detected in a deamidated form, at residue Q6. Accession XM_030339720.1 could be used to find specific peptides in the other Gadiformes hake, saithe and haddock, see Table 2. Similarly, the Clupea harengus collagen 1α1-like sequence XP_031416510.1 (only protein) clustered with collagen 1α3, see Supplementary Fig. 1, and could be used to extract information for sardine. Using this approach a specific peptide for sardine could be identified and therefore it is probable that collagen 1α3 was identified in sardine, see Table 2. However, accurately annotated genetic data is required to fully confirm these results, especially for category IV species. Collagen 1α3 could be identified in several teleost fish species, with varying degrees of confidence. Depending on the amount of available genetic information it was important to use different approaches to identify the collagen chain. It is expected that collagen 1α3 occurs in many other teleost fish species. Additionally, collagen-specific collision induced internal fragmentations were described. During the interpretation of MS/MS spectra it is very important to consider these fragmentations to facilitate elucidation of the sequence and to avoid wrong protein and peptide assignments. Collagen 1α3 was identified not only in fish skin extracts, but also in fish skin gelatin which is a food product. Its high abundance, having the same stoichiometry as the 1α1 and 1α2 chains in fish skin, underlines that the protein is more important for food consumers than suggested by its prominence in literature. Although collagen 1α3 is similar to collagen 1α1, its presence will affect food (product) properties, such as amino acid content. Tilapia collagen 1α3, for example, contains significantly less methionine in the chain than tilapia collagen 1α1 and significantly more histidine, serine and isoleucine. This will also affect the population of collagen di- and tripeptides which survive the gastrointestinal tract and brush border peptidase activity, to finally enter the blood (Kleinnijenhuis et al., 2020). Due to evolutionary divergence, however, the effect on amino acid content will be different for each fish species.

Conclusion

Different approaches were used to identify collagen 1α3 in the skin of teleost fish species. Depending on the amount of available genetic information the chain could be identified with varying degrees of confidence. Due to the specific collagen properties and behavior in LC-MS/MS it is essential to also manually interpret MS/MS spectra of tryptic collagen peptides, to avoid wrong protein and peptide assignments. We described the occurrence of collagen-specific internal y(n-2)/b and y(n-2)/a fragment ion series, especially abundant when the third residue is a hydroxyproline (Hyp-3 peptides). Collagen 1α3 was identified in teleost fish skin extracts, but also in fish skin gelatin which is a food product. Although it is similar to collagen 1α1, the presence of collagen 1α3 will affect food (product) properties, such as amino acid content and the population of collagen di- and tripeptides which can enter the blood. Therefore, it is important to consider the presence of collagen 1α3 in fish (products) and nutraceuticals prepared from fish skin collagen.

CRediT authorship contribution statement

Anne J. Kleinnijenhuis: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing. Frédérique L. van Holthoon: Conceptualization, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing. Bastiaan van der Steen: Conceptualization, Resources, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  24 in total

1.  Phylogenetic analysis of vertebrate fibrillar collagen locates the position of zebrafish alpha3(I) and suggests an evolutionary link between collagen alpha chains and hox clusters.

Authors:  Ghislaine Morvan-Dubois; Dominique Le Guellec; Robert Garrone; Louise Zylberberg; Laure Bonnaud
Journal:  J Mol Evol       Date:  2003-11       Impact factor: 2.395

2.  The collagens of hydra provide insight into the evolution of metazoan extracellular matrices.

Authors:  Xiaoming Zhang; Raymond P Boot-Handford; Julie Huxley-Jones; Lorna N Forse; A Paul Mould; David L Robertson; Matthews Athiyal; Michael P Sarras
Journal:  J Biol Chem       Date:  2007-01-03       Impact factor: 5.157

3.  Collagen gene construction and evolution.

Authors:  B Runnegar
Journal:  J Mol Evol       Date:  1985       Impact factor: 2.395

4.  Characterization of a collagen from codfish skin containing three chromatographically different alpha chains.

Authors:  K A Piez
Journal:  Biochemistry       Date:  1965-12       Impact factor: 3.162

Review 5.  The fibrillar collagen family.

Authors:  Jean-Yves Exposito; Ulrich Valcourt; Caroline Cluzel; Claire Lethias
Journal:  Int J Mol Sci       Date:  2010-01-28       Impact factor: 6.208

6.  Cleavage N-terminal to proline: analysis of a database of peptide tandem mass spectra.

Authors:  Linda A Breci; David L Tabb; John R Yates; Vicki H Wysocki
Journal:  Anal Chem       Date:  2003-05-01       Impact factor: 6.986

7.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

8.  Macro- and micromechanical remodelling in the fish atrium is associated with regulation of collagen 1 alpha 3 chain expression.

Authors:  Adam N Keen; Andrew J Fenna; James C McConnell; Michael J Sherratt; Peter Gardner; Holly A Shiels
Journal:  Pflugers Arch       Date:  2018-03-28       Impact factor: 3.657

9.  Non-targeted and targeted analysis of collagen hydrolysates during the course of digestion and absorption.

Authors:  Anne J Kleinnijenhuis; Frédérique L van Holthoon; Annet J H Maathuis; Barbara Vanhoecke; Janne Prawitt; Fabien Wauquier; Yohann Wittrant
Journal:  Anal Bioanal Chem       Date:  2019-12-24       Impact factor: 4.142

10.  A predictive model for vertebrate bone identification from collagen using proteomic mass spectrometry.

Authors:  Heyi Yang; Erin R Butler; Samantha A Monier; Jennifer Teubl; David Fenyö; Beatrix Ueberheide; Donald Siegel
Journal:  Sci Rep       Date:  2021-05-25       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.