Kristine Korzow Richter1, Maria C Codlin2, Melina Seabrook1, Christina Warinner1,3. 1. Department of Anthropology, Harvard University, Cambridge, MA 02318. 2. Department of Anthropology, Boston University, Boston, MA 02215. 3. Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany.
Abstract
Collagen peptide mass fingerprinting by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry, also known as zooarchaeology by mass spectrometry (ZooMS), is a rapidly growing analytical technique in the fields of archaeology, ecology, and cultural heritage. Minimally destructive and cost effective, ZooMS enables rapid taxonomic identification of large bone assemblages, cultural heritage objects, and other organic materials of animal origin. As its importance grows as both a research and a conservation tool, it is critical to ensure that its expanding body of users understands its fundamental principles, strengths, and limitations. Here, we outline the basic functionality of ZooMS and provide guidance on interpreting collagen spectra from archaeological bones. We further examine the growing potential of applying ZooMS to nonmammalian assemblages, discuss available options for minimally and nondestructive analyses, and explore the potential for peptide mass fingerprinting to be expanded to noncollagenous proteins. We describe the current limitations of the method regarding accessibility, and we propose solutions for the future. Finally, we review the explosive growth of ZooMS over the past decade and highlight the remarkably diverse applications for which the technique is suited.
Collagen peptide mass fingerprinting by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry, also known as zooarchaeology by mass spectrometry (ZooMS), is a rapidly growing analytical technique in the fields of archaeology, ecology, and cultural heritage. Minimally destructive and cost effective, ZooMS enables rapid taxonomic identification of large bone assemblages, cultural heritage objects, and other organic materials of animal origin. As its importance grows as both a research and a conservation tool, it is critical to ensure that its expanding body of users understands its fundamental principles, strengths, and limitations. Here, we outline the basic functionality of ZooMS and provide guidance on interpreting collagen spectra from archaeological bones. We further examine the growing potential of applying ZooMS to nonmammalian assemblages, discuss available options for minimally and nondestructive analyses, and explore the potential for peptide mass fingerprinting to be expanded to noncollagenous proteins. We describe the current limitations of the method regarding accessibility, and we propose solutions for the future. Finally, we review the explosive growth of ZooMS over the past decade and highlight the remarkably diverse applications for which the technique is suited.
Entities:
Keywords:
MALDI-TOF; mass spectrometry; peptide mass fingerprinting; zooarchaeology
Zooarchaeology by mass spectrometry (ZooMS) is a powerful application of collagen peptide mass fingerprinting (PMF) first developed just over a decade ago (1). Based on the measurement of tryptic collagen peptides using a matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometer, it leverages the high abundance and long-term preservation of collagen in bone and other animal tissues with the analytical power of mass spectrometry (MS) in order to provide robust taxonomic identifications using minimally destructive methods. Since 2009, ZooMS has been used for diverse applications in archaeology and paleontology, ecology and conservation, and cultural heritage. The key features of ZooMS that have led to its rapid expansion are its low sample input requirements and its relatively low analytical cost per sample compared with other biomolecular identification methods. This allows for large-scale taxonomic investigations that can augment morphological analyses of faunal assemblages as well as provide taxonomic clarity for animal remains or products lacking diagnostic features, as is common for worked bone artifacts and cultural heritage objects (2).
ZooMS One Decade in
Over the past decade, ZooMS has been widely used to provide identifications of collagenous materials, including bone (1), ivory (3), antler (4), parchment and vellum (5), leather (6), and other soft tissues (7). Within the context of archaeology, the application of ZooMS to faunal assemblages has allowed a wide range of topics to be explored, including domestic herd management, choices relating to secondary product use, exploitation of wild species, and the appearance of commensal species (5,8–12) (). Improvements in scalability, automation, and high-throughput processing mean ZooMS can be used as a screening tool in order to identify species of interest among otherwise unidentifiable fragmentary remains. This approach has been highly successful in identifying a handful of hominin remains from nearly 10,000 bone fragments at Denisova cave (13–15), human remains at other Late Pleistocene and Early Holocene sites (16,17), and extinct megafauna (18).Better taxonomic resolution of assemblages has also allowed for improved ecological reconstructions. ZooMS is best applied in situations when morphologically similar species inhabit different ecological niches. This is applicable in the reconstruction of terrestrial ecosystems (19,20), but ZooMS is even more powerful when applied to the reconstruction of aquatic ecosystems due to the larger number of possible species usually present and the reduced ability to achieve desired taxonomic resolution using conventional techniques (21,22). Although the integration of findings from ZooMS data into current conservation practices remains limited, recent ZooMS successes in identifying ivory (3) and distinguishing wild African bovids (11,23) show great promise for providing low-cost solutions for identifying the trade of illicit animal products, such as ivory objects and bushmeat.Finally, because ZooMS uses a very low starting amount of material and can be performed with minimally destructive and noninvasive protocols, it is an ideal method for the identification of worked bone tools and other composite artifacts from both archaeological and cultural heritage contexts. These include worked bone and antler (bone points, arrowheads, daggers, rings, combs), leather, composite artifacts, parchment, works of art, and gelatin-coated photographs (5–7,24–29). In addition, it has been used to help identify remains in museum collections in cases where the associated metadata have been lost (30).The first decade of ZooMS has showcased its wide-ranging, continued applicability (). Although tandem mass spectrometry and ancient DNA approaches provide higher taxonomic resolution, PMF-based methods are more cost effective, allow for greater sample throughput, and provide sufficient taxonomic resolution for many archaeological and cultural heritage questions, making them more accessible for many researchers. The past decade has also highlighted the need for further ZooMS development, particularly with respect to standardizing data reporting, centralizing marker databases, using consistent nomenclature, and developing automated tools for data analysis. Addressing these limitations will allow ZooMS to grow from the purview of a small number of research laboratories into a robust, widely used method. Here, we review the state of the field in ZooMS research. After detailing what collagen is and why it is important in archaeological and cultural heritage research, we describe the major technological advancements that led to the development of ZooMS and the growth of its subsequent applications. We review how the method has changed and expanded over the past decade, and we outline current limitations in the field. Finally, we discuss the outlook of ZooMS research over the next decade.
Collagen: What It Is, and Why It Matters
Collagens are an abundant class of structural proteins essential for life in animals, from sponges to humans (31). There are nearly 30 different collagen proteins, of which type I collagen (COL1) is among the most ubiquitous and abundant (32), comprising 80% of the bone proteome (33). COL1 is the major component of animal connective tissues. It is highly abundant in skin, tendon, ligament, and fish scales (33,34), and it is also found in bird and reptile eggshells (35,36), invertebrate shells (37), and a wide range of other animal-derived tissues. Common archaeological remains and cultural heritage objects that contain collagen include bone, cartilage, antler, horn cores (but not horn itself), tooth dentine and cementum, ivory, parchment, leather, fish scales, and composite tools or artwork containing sinews, animal glues, or animal binders (Fig. 1).
Fig. 1.
Overview of collagen structure and archaeological sources. Collagen can be retrieved from a wide range of animal tissues. In most animals, the COL1 triple helix is composed of two ɑ1-chains and one ɑ2-chain. Five triple helices are bundled into a microfibril. Bundles of microfibrils form a fibril, and bundles of fibrils form fibers. During the initial stages of ZooMS, this structure is denatured, allowing the enzyme trypsin to cut the protein into peptides. Peptides differ in sequence and mass across taxa, as shown for turkey (M. gallopavo), goat (C. hircus), and coho salmon (O. kisutch). Icons are fromhttps://openmoji.organdhttps://smart.servier.com. Adapted from ref.42.
Overview of collagen structure and archaeological sources. Collagen can be retrieved from a wide range of animal tissues. In most animals, the COL1 triple helix is composed of two ɑ1-chains and one ɑ2-chain. Five triple helices are bundled into a microfibril. Bundles of microfibrils form a fibril, and bundles of fibrils form fibers. During the initial stages of ZooMS, this structure is denatured, allowing the enzyme trypsin to cut the protein into peptides. Peptides differ in sequence and mass across taxa, as shown for turkey (M. gallopavo), goat (C. hircus), and coho salmon (O. kisutch). Icons are fromhttps://openmoji.organdhttps://smart.servier.com. Adapted from ref.42.
COL1 Fibril Formation and Structure.
At the molecular level, COL1 consists of a triple helix made up of three polypeptide α-chains (COL1ɑ) (38). In tetrapods, the triple helix is heterotrimeric, composed of two identical COL1ɑ1 chains and one COL1ɑ2 chain (Fig. 1). In teleost fish, it is made up of three different chains (COL1ɑ1, COL1ɑ2, COL1ɑ3) (39), while a small number of species, such as the unicellular hydra, have homotrimeric COL1 composed of three COL1ɑ1 chains. The amino acid sequence of COL1 is highly structurally and functionally constrained (31). Each chain consists of a repeating motif of G-X-Y (Fig. 1) with glycine (G), the smallest amino acid, fitting into the central core of the rotating triple helix. The remaining X and Y amino acid positions are disproportionately made up of proline and hydroxyproline, respectively, the latter being a posttranslational modification (PTM) of proline rarely found outside of collagens. Hydroxyprolines stabilize the triple helix through hydrogen bonding (40,41) and can be always present (fixed modification) or variably present (variable modification) at a given amino acid position. Amino acids with bulky functional groups are almost entirely absent from COL1 because they disrupt or prevent the formation of the triple helix.During COL1 formation, proto-COL1ɑ chains are produced containing a signal peptide (∼20 amino acids) and N-terminal and C-terminal propeptides (∼150 to 300 amino acids each) that flank the mature ɑ-chain (∼1000 amino acids). The signal peptide aids in trafficking the ɑ-chain to the endoplasmic reticulum, where the propeptides initiate triple-helix formation. The signal peptide and propeptides are removed prior to fibrillogenesis, and thus are not of interest for ZooMS (38,42). The mature COL1 protein consists of a highly conserved triple-helical region flanked at each end by short, highly variable telopeptides (∼9 to 30 amino acids). After cellular secretion, the collagen triple helices aggregate into groups of approximately five to form microfibrils (43). Groups of microfibrils aggregate to form fibrils, which then bundle together to form collagen fibers (Fig. 1). In bone, collagen fibers serve as the template for biomineralization (44). The tight packing and bundling of collagen are essential to its function as a structural protein and contribute to collagen’s long-term persistence and preservation in the archaeological record (45,46).
Collagen Degradation and Recovery in Various Archaeological Tissues.
Collagen preservation has been extensively studied, in part because of the importance of bone collagen for stable isotope analysis and radiocarbon dating (45,47). Studies of paleontological and archaeological collagen have revealed that well-preserved, high molecular weight collagen (>30 kDa) in excess of 30% of the original protein length can still be found even within extremely old bones (>1 Mya) (45,48–50). Despite its abundance and robusticity, however, collagen is nevertheless still susceptible to taphonomy and diagenesis. A wide range of soil microbes and fungi are capable of producing collagenases that rapidly degrade unmineralized collagen, and even mineralized collagen undergoes spontaneous chemical hydrolysis of peptide bonds (51). Decomposing soft tissue contributes to bone demineralization by exposing collagen to microbial attack, and the secretion of organic acids from microbial growth further demineralizes skeletal material (46,52). Individual amino acids can undergo diagenetic alterations, such as deamidation, glycation, oxidation, and cross-linking, to produce diagenetiforms, which disrupt the triple-helical structure and make the collagen backbone more susceptible to hydrolysis (45,53,54). Over time, the integrity of the collagen protein within archaeological remains declines, and collagen degradation products begin to leach out.For archaeological remains, collagen preservation is typically estimated by measuring the dry weight percentage (%wt) of collagen or by determining the percentage of N (%N) or the atomic C:N of a given collagenous material (49,55,56). Depending on the species and type of bone, fresh bone contains 20 to 35% organic matrix, of which ∼90% is collagen (57,58), giving bone a %N of ∼3.5 to 4.5% (55), and, for humans, a C:N of 3.2 (59). As a general rule, a minimum of 1% collagen, 0.5% N, and C:N values between 2.9 and 3.6 are widely used as a minimum standard for collagen preservation in stable isotope and radiocarbon dating studies (55,59). Even less material is required for ZooMS, which has been shown to reliably yield identifiable collagen spectra from bones with as little as 0.26% N (60). As such, ZooMS can generally be applied to a wider range of archaeological remains, including from challenging environments and deep time that would otherwise be ill suited for other methods (61).
PMF: The Mass Spectrometry Revolution for Paleoproteomics and Cultural Heritage
Peptide Mass Fingerprint Basics.
PMF is a technique, developed in the 1990s (62), to identify proteins by the masses of the peptides produced following enzymatic digestion. The initial development of PMF was made possible by the innovation of the matrix-assisted laser desorption/ionization (MALDI) soft ionization method during the late 1980s (63). MALDI represented a major breakthrough in protein chemistry, enabling large nonvolatile molecules, such as small proteins and peptides, to be ionized without fragmentation for downstream mass spectrometry. Coupled with a TOF analyzer, the MALDI-TOF mass spectrometry system is a robust, simple, and sensitive instrument with a large mass range (62) that is ideally suited for PMF. PMF works best on individual proteins and complex mixtures with reliable composition (64,65). Although most archaeological remains contain complex and variable protein mixtures, some collagenous tissues and residues are so dominated by COL1 that they can be analyzed by PMF. Using COL1 for taxonomic identification is referred to as ZooMS (1).While the principles and concepts behind ZooMS are relatively straightforward, in practice the technique can be complicated by posttranslational and diagenetic modifications, the unavailability of taxonomically informative markers, and other factors. The first step in ZooMS analysis is to extract the collagen from its matrix. This step is the most variable as it depends upon the type of material, its preservation history, its previous treatment, and its ability to undergo destructive analysis. For mineralized tissues where destructive analysis can be used, dissolving the mineral matrix using hydrochloric acid is thought to be the most reliable method, especially for poorly preserved samples (1). For nonmineralized tissues, when less destructive analysis is desired or when the use of acid is problematic, alternative methods are also available. During the extraction process, COL1 is gelatinized using heat so that the primary amino acid structure is available for enzymatic digestion. Because soil humic acids and other base-soluble compounds can interfere with MALDI-TOF analysis, a brief incubation of the collagen in dilute sodium hydroxide (6,66) or other treatments (67) can be optionally applied to remove these compounds during extraction, thereby improving downstream spectral quality. At the end of the extraction process, the free collagen chains are suspended in a pH-neutral solution.After extraction, the collagen is digested with a protease, typically trypsin, which cleaves the C-terminal peptide bonds of arginine (R) and lysine (K) residues and adds the mass of a water molecule to each peptide. This produces a series of collagen peptides that differ both in length and in mass between taxa (Fig. 1). The peptides are then acidified, purified with a C18 filter, and spotted onto an MALDI plate with a matrix, typically α-cyano-4-hydroxycinnamic acid, that cocrystalizes with the peptides (Fig. 2). The matrix is then excited with a laser, causing the peptides to vaporize and ionize with a +1 charge. Electromagnets direct the ions into a time-of-flight tube where they separate by mass, with the smallest peptides hitting the detector first and the largest hitting the detector last (Fig. 2). The resulting mass spectrum produced by the detector is then calibrated using standards to convert time into mass-to-charge ratios (m/z), and the observed peaks are ready for analysis (Fig. 2).
Fig. 2.
Steps of MALDI-TOF and representative collagen spectra. Digested collagen peptides (pink) are embedded in the matrix (blue) and ionized with a laser. Charged peptides (+1) are then accelerated through a TOF tube, where they separate by mass. The output of the detector is visualized as spectra, where time is converted tom/zbased on calibration standards. Collagen spectra are shown for turkey (M. gallopavo), goat (C. hircus), and coho salmon (O. kisutch). Authenticated collagen peptide peaks are indicated by pink circles (Dataset S1). Three taxonomically informative marker peptides are annotated, withInsetsindicating the collagen chain, position,m/z, and sequence; amino acids that differ across taxa are highlighted in green. Note that although isoleucine (I) and leucine (L) differences are highlighted, they are not distinguishable by MALDI-TOF. Baseline correction, smoothing, and intensity normalization were performed in mMass (134). Adapted from ref.42. Data from refs.21,140, and141.
Steps of MALDI-TOF and representative collagen spectra. Digested collagen peptides (pink) are embedded in the matrix (blue) and ionized with a laser. Charged peptides (+1) are then accelerated through a TOF tube, where they separate by mass. The output of the detector is visualized as spectra, where time is converted tom/zbased on calibration standards. Collagen spectra are shown for turkey (M. gallopavo), goat (C. hircus), and coho salmon (O. kisutch). Authenticated collagen peptide peaks are indicated by pink circles (Dataset S1). Three taxonomically informative marker peptides are annotated, withInsetsindicating the collagen chain, position,m/z, and sequence; amino acids that differ across taxa are highlighted in green. Note that although isoleucine (I) and leucine (L) differences are highlighted, they are not distinguishable by MALDI-TOF. Baseline correction, smoothing, and intensity normalization were performed in mMass (134). Adapted from ref.42. Data from refs.21,140, and141.
Peptide Mass Fingerprints of Collagen and How They Are Used.
Interpreting a mass spectrum involves associating peaks with a givenm/zto a specific peptide sequence. ZooMS typically uses MALDI-TOF to measure peptide masses between 800 and 3,500 Da, which correspond to peptides of ∼8 to 30 amino acids in length. Although theoretically all COL1 peptides in this mass range should appear in the PMF, in practice not all peptides are observed. Cross-linking, glycosylation, glycation, incomplete digestion, and poor ionization can contribute to the failure to observe predicted COL1 peptide peaks, as can unexpected PTMs and nonenzymatic peptide fragmentation.Here, we show how this occurs in practice.Fig. 2shows the COL1 PMF of turkey (Meleagris gallopavo), goat (Capra hircus), and coho salmon (Oncorhynchus kisutch). The sequences of the COL1ɑ chains differ between these three species (Fig. 1), and this is reflected in the different peak positions of the peptides (Fig. 2). InFig. 2, peaks with masses corresponding to known collagen peptides are annotated with pink circles (Dataset S1has details). Not all collagen peptides are taxonomically informative, but those that are taxonomically informative are called marker peptides. Three marker peptides—COL1ɑ1 508-519, COL1ɑ2 454-483, and COL1ɑ2 757-789 (nomenclature is after ref.42)—are highlighted. Although marker peptide COL1ɑ1 508-519 is not visible in the coho salmon spectrum, it has been previously observed and reported for this species (22).Fig. 3illustrates some of the challenges in analyzing collagen PMFs. Besides collagen (pink circles), contaminant peaks are also present within ZooMS spectra, with the most common and abundant being keratins from skin and clothing (green circles) as well as matrix peaks (blue circles) and occasionally, autodigested trypsin peaks (68). Short COL1 peptides with masses <1,000 Da are rarely used as marker peptides because they frequently overlap with matrix peaks (Fig. 3), and long, high-mass peptides are often less reliably visible in spectra (Fig. 3). Consequently, the most robust and useful marker peptides tend to be ∼1,000 to 3,100 Da in size (Dataset S1). In vivo PTMs and diagenetiforms can complicate analysis by inducing mass shifts. However, only two are relevant for ZooMS, as most of these modifications are present at low levels or are spontaneous, and therefore they are not typically visible by MALDI-TOF MS. The most important PTM is hydroxyproline. Collagen spectra often contain peptide mixtures with different numbers of hydroxylated prolines, with each hydroxyproline causing a mass shift of +16 Da. In the example above, for instance, the sheep/cattle peptide COL1ɑ 757-789 typically produces two peaks: one at 3,017 Da with four hydroxyprolines and one at 3,033 Da with five hydroxyprolines (1). While rarely visible by MALDI-TOF MS, peptides with three and six hydroxyprolines are also present in bovids as shown by liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis (11). In addition to hydroxyprolines being abundant in mature collagen, recent studies have shown that stochastic gains and losses of hydroxyproline may also occur postmortem during diagenesis (54).
Fig. 3.
Interpreting a collagen peptide mass fingerprint. (A) Spectrum of a goat (Upper) and an archaeological unknown bone (YTC 29-1) from the site of Tepe Yahya, Iran (Lower). Colored circles indicate authenticated tryptic collagen peaks (pink), tryptic keratin peaks (green), and matrix peaks (blue). Peaks that could derive from different peptides of the same mass are indicated with both colors. (A,i) Matrix peaks are typically low mass (<1,000 Da) and overlap with short collagen peptides. (A,ii) High-mass peptides are often underrepresented. (A,iii) Peptides with the same or similar mass have overlapping peaks and may not be distinguishable, thus making them poor markers. (A,ivandv) Isotope distributions that derive from a single peptide peak differ for low- and high-mass peptides. (vi) Peaks that are not specific because they can derive from different peptides also make poor marker peptides. (B) Enhanced view of mass ranges from 1,520 to 1,600 Da (Left) and from 3,015 to 3,100 Da (Right) highlighted inAas well as from sheep (O. aries), springbok (A. marsupialis), and archaeological unknown bones YTC 141-1 and YTC 29-1. Collagen peak masses are indicated above the spectra with markers in bold and pink shading (Dataset S1). Unknown bone YTC 141-1 has collagen markers 1,580 and 3,093 Da, indicating that it is a goat. Unknown YTC 29-1 has marker 1,580 Da, confirming that it belongs to Caprinae, but it lacks sufficient signal at higher-mass peptides to distinguish between sheep and goat. Data from ref.141.
Interpreting a collagen peptide mass fingerprint. (A) Spectrum of a goat (Upper) and an archaeological unknown bone (YTC 29-1) from the site of Tepe Yahya, Iran (Lower). Colored circles indicate authenticated tryptic collagen peaks (pink), tryptic keratin peaks (green), and matrix peaks (blue). Peaks that could derive from different peptides of the same mass are indicated with both colors. (A,i) Matrix peaks are typically low mass (<1,000 Da) and overlap with short collagen peptides. (A,ii) High-mass peptides are often underrepresented. (A,iii) Peptides with the same or similar mass have overlapping peaks and may not be distinguishable, thus making them poor markers. (A,ivandv) Isotope distributions that derive from a single peptide peak differ for low- and high-mass peptides. (vi) Peaks that are not specific because they can derive from different peptides also make poor marker peptides. (B) Enhanced view of mass ranges from 1,520 to 1,600 Da (Left) and from 3,015 to 3,100 Da (Right) highlighted inAas well as from sheep (O. aries), springbok (A. marsupialis), and archaeological unknown bones YTC 141-1 and YTC 29-1. Collagen peak masses are indicated above the spectra with markers in bold and pink shading (Dataset S1). Unknown bone YTC 141-1 has collagen markers 1,580 and 3,093 Da, indicating that it is a goat. Unknown YTC 29-1 has marker 1,580 Da, confirming that it belongs to Caprinae, but it lacks sufficient signal at higher-mass peptides to distinguish between sheep and goat. Data from ref.141.The most relevant diagenetiforms are the deamidation of glutamine (Q) to glutamic acid (E) and asparagine (N) to aspartic acid (D), which are both commonly observed in most archaeological proteins (53,69,70). Deamidation can occur through side chain hydrolysis or condensation reactions (53), resulting in a net +1 Da mass shift due to replacement of the amide with a carboxyl functional group. Depending on the amino acid sequence and degree of degradation, a given collagen peptide may contain zero, one, or more deamidated residues that result in overlapping peak distributions, and consequently multiple permutations must be taken into account during peak identification. Deamidation has been proposed as an indicator of relative age or as a way to identify intrusive samples (53,71–73). However, in practice, this use has been questioned (70,74) as it requires very large datasets to attempt, and even then the accuracy of assigning samples to age class is poor (75). It is likely that deamidation is strongly impacted by postmortem treatment, such as liming of parchments (76), by local depositional chemistry (74,75), and by choice of extraction method (77,78), which introduces variation that limits its straightforward application as an age indicator.Finally, many high-abundance peaks in collagen PMFs are not marker peaks because they actually consist of two or more overlapping peptide peaks with identical or highly similar (differing by only ∼1 Da) masses (Fig. 3). For example, the peak at mass 3,017 Da shown inFig. 3has been found to be an unreliable marker peak because it can result either from the sheep/cattle variant of peptide COL1ɑ 757-789 containing four hydroxyprolines or from the highly conserved bovid peptide COL1ɑ2 90-130 with five hydroxyprolines (11). Such problematic peaks can sometimes be recognized because the pattern and intensity of their isotope peaks differ from those expected for low-mass (Fig. 3) or high-mass (Fig. 3) peptides, but when the masses of the peptides are identical, overlapping peptides can only be identified using tandem mass spectrometry. For these reasons, it is essential to perform MS/MS analysis on candidate markers to determine their specificity and fidelity when developing or publishing new ZooMS markers for taxonomic identification (11,42).Despite these challenges, however, when marker peptides are well chosen and spectra are carefully analyzed, ZooMS is a powerful tool for assigning taxonomy to unknown bones and other collagenous remains. An example of ZooMS identifications for two medium-sized mammal bones (YTC 29-1 and YTC 141-1) from the archaeological site of Tepe Yahya in Iran (79) is shown inFig. 3.Fig. 3shows the full PMF of a reference goat (Fig. 3,Upper) and YTC 29-1 (Fig. 3,Lower). Of the 23 identified collagen peaks visible in the goat spectrum (pink circles), 16 are shared with other members of the Bovidae family. These peaks are also present in YTC 29-1 and YTC 141-1, indicating that the unknown bones derive from a bovid. The remaining seven collagen peaks are taxonomically variable within Bovidae, and of these, three are useful markers for distinguishing the medium-sized bovids possibly present at Tepe Yahya (Dataset S1): goat, sheep (Ovis aries), and springbok (Antidorcas marsupialis) (10,23).Fig. 3highlights the two mass windows where these diagnostic collagen peptides are visible (1,520 to 1,600 Da and 3,015 to 3,100 Da). Within the windows, a total of 13 peaks are present, only 5 of which correspond to the three diagnostic markers: COL1ɑ2 502-519 is present in sheep/goat at 1,580 Da and springbok at 1,550 Da; COL1ɑ2 757-789 is present at 3,033 Da in sheep/springbok and at 3,093 Da in goat; and COL1ɑ2 889-906 is present at 1,532 Da in springbok and at 1,560 Da in sheep/goat. Unfortunately, the peak at 1,560 Da overlaps with another collagen peptide and therefore is not informative. Archaeological bone YTC 141-1 has peaks 1,580 and 3,093 Da, allowing for an identification of goat. Archaeological bone YTC 29-1 has a peak at 1,580 Da, narrowing down the possibilities to sheep or goat, but it lacks a corresponding diagnostic peak at either 3,033 Da or 3,093 Da, suggesting that the COL1ɑ2 757-789 peptide was not detected due to a lack of preservation, a failure to ionize, or another taphonomic reason. Although a small peak may be discernible at 3,093 Da, it does not have an intensity value at least threefold above the background, thus making it unreliable to call. A signal to noise ratio of at least three is generally considered the limit of detection for MALDI-TOF, but higher limits should be used if the background noise is particularly strong (80,81). Consequently, YTC 29-1 can only be identified to sheep/goat.
ZooMS: Methods and State of the Field
Foundational Work on Mammals.
Prior to ZooMS, PMF was used to identify collagenous glues and paint binders from milk- or egg-based materials in artwork, with successful taxonomic discrimination between rabbit, cow, and fish glues (82,83). ZooMS was then developed to identify faunal skeletal remains using an initial set of seven markers from a comparative analysis of 32 mammals and four birds, including a variety of wild terrestrial and aquatic species, domesticates, and commensal small mammals (1). Early work focused on determining the utility of ZooMS for distinguishing particular species, such as sheep/goat (10) and mammoth/mastodon (84). As the field grew, the number of markers expanded to nine (Dataset S1), and the technique was applied to a wider range of animals but still focused primarily on medium and large mammals (9,17,48,85). Over the past 5 years, the number of characterized mammals has dramatically increased and geographically diversified to include, for example, wild bovids (11), rodents (20,86), bats (20), and marsupials (87) (). Recent work has also further explored multispecies collagen mixtures found in animal glues (88).
Taxonomic Resolution and Expansion of the Marker Database.
Due to differing thermal and other functional constraints, collagen sequence variability differs substantially across major animal clades, such as birds, mammals, and fish (Dataset S1and), and this has important implications for the use of ZooMS to provide taxonomic identifications. Large mammals, for which collagen sequence data and ZooMS markers are most developed, are reliably identifiable to the family level (1,9), with examples of subfamily-level (e.g., cetaceans) and even genus-level (e.g., sheep/goat) separation being possible using the current nine standard markers. Recent work on identifying and confirming additional markers has increased the taxonomic resolution for bovids (11,23), elephantiformes (89), and some cervids (90), but current markers provide limited resolution for several groups where taxonomic discrimination would be useful, including camelids, felids, and many cervids. In small-bodied mammals, greater taxonomic resolution is often possible due to the larger variability of collagen sequences, with more cases of genus-level and even some species-level identifications (86,91).From its earliest days, ZooMS was occasionally applied to nonmammals, but it is only recently that the method has been developed in earnest for birds (8), fish (21,22,92), amphibians (93), and reptiles (12,20,93). Fish have been shown to have the most collagen sequence variants (94,95) due to the relaxed functional constraints of their lower body temperature and buoyancy, as well as the presence of a distinct third ɑ-chain (COL1ɑ3) in most fishes (21,96). As such, fish have the highest potential for species-level taxonomic resolution (2,21,22,92,97,98). The high collagen sequence variability of fish is particularly beneficial for zooarchaeologists because fish bones are often difficult to identify due to their lack of morphologically identifiable features (99).Preliminary investigations on the use of ZooMS to identify reptiles and amphibians have also achieved putative species-level identifications (12,93), although the low number of samples tested and the current lack of genetic data for many reptiles and amphibians make further work necessary. Interest in applying ZooMS to birds has lagged behind that of other groups, in part because the slower mutation rate of avian collagen makes it more difficult to distinguish related groups of birds (2,8), but it nevertheless remains useful for identifying key domesticates (8). For all faunal groups, improved taxonomic precision can be achieved when additional geographical, morphological, and archeological context information is taken into account to constrain possible ZooMS identifications. As more collagen sequences become available in genomic databases, the applicability of ZooMS to nonmammalian taxonomic groups is expected to improve.
Development of Minimally Destructive and Noninvasive Sampling Protocols.
One advantage of ZooMS is that it can be applied to very small amounts of collagen, allowing for the use of minimally destructive and noninvasive sampling techniques. Mineralized samples and leather can be soaked overnight in an ammonia bicarbonate buffer and then heated briefly to extract soluble collagen (100). This method causes minimal observable damage to bone samples, which can then be dried, or subsequently reused for stable isotope or ancient DNA studies (4,100). Other minimally destructive and noninvasive sampling techniques rely on the triboelectric effect, in which friction between a plastic polymer and a protein creates electrostatic attraction that captures proteins on the surface of the polymer (28). This technique was first developed for parchments sampled using a polyvinyl chloride eraser (5). Since then, the triboelectric effect has also been used to successfully retrieve collagen from plastic sample bags and vials. This method is particularly suited to sampling worked bone artifacts (28,101) and has also been used to recover collagen from “empty” vials used in the lyophilization of collagen for isotopic analysis (16). Recently, a number of additional minimally destructive methods have been developed for art and cultural heritage materials. These include polishing films with grit (25), ethylene vinyl acetate films studded with strong cation and anion exchangers and C8/C18 resins (102), and enzyme functionalized films (103) and hydrogels (27).While these minimally destructive sampling techniques are generally successful for well-preserved samples, they can be less effective for poorly preserved samples than conventional destructive methods and may result in fewer peaks (and therefore, lower taxonomic resolution) and lower-quality spectra. Nevertheless, they offer a number of advantages. Most importantly, they can be applied to rare or fragile artifacts where destructive sampling is not allowed. In addition, minimally destructive sampling and sample transport are also often easier to perform and can be carried out by nonspecialists (5,28,104). Finally, the sampling and extraction procedures of noninvasive techniques are generally faster than destructive methods because they eliminate the time-consuming acid digestion and neutralization steps.
Expansion of PMF to Noncollagenous Proteins and Protein Mixtures.
Beyond collagen, other archaeological proteins and proteomes have also been explored for taxonomic identification using PMF. Like collagens, keratins and corneous beta-proteins (CBPs; formerly beta-keratins) are also highly ubiquitous structural proteins, with dozens of different characterized types found in mammalian epithelial cells as well as hair, wool, nails, quills, horn, baleen, feathers, turtle shells, and reptile scales (105–107). Although these tissues are not composed of one dominant protein, keratin and CBP mixtures can be taxonomically identified using PMF (108,109). Keratin markers have been developed for a few dozen mammal species (108–113), while CBP markers have only been developed for sea turtles (114,115). However, human and sheep keratins are also common contaminants in proteomics research, as human epithelial keratins are the primary constituents of airborne dust (116) and wool is a common component of clothing (68,117). This necessitates extreme caution when using markers that correspond to the masses of any of the tryptic peptides from sheep or human keratin. Although keratins and CBPs show great promise for further PMF development, research has been delayed by a lack of available sequence data and the historical nonuniformity of keratin nomenclature (118). However, with improved databases, achieving genus-level taxonomic resolution appears possible in many circumstances.PMF can also be used for taxonomic identification of some complex protein mixtures, such as those found in eggshells and mollusk shells. Extraction pretreatment methods using sodium hypochlorite (NaOCl) can decontaminate and isolate the endogenous intracrystalline proteins before demineralization and digestion, thereby decreasing contamination (119,120). While advances in bird genome sequencing have aided the sequence determination of highly variable eggshell biomarkers across taxa (119), incomplete taxonomic availability of genomic data still limits the identification of most avian eggshell to order or family (121). Nevertheless, this technique has great potential for identifying avian taxa to the level of genus or species in the archaeological record (121–123) and is likely also suitable for identifying fossilized reptilian eggshells. While mollusk shells have long been known to contain organic compounds, including proteins, archaeological mollusk shell proteomes have only recently been explored through PMF (124–126). Although the database at present is very limited, shell matrix proteins are highly diverse and therefore have the potential to achieve genus- or species-level resolution (127).PMF continues to be used in art conservation to identify noncollagenous paint binders, such as: caseins and beta lactoglobulin from milk; vitellogenins, apolipoproteins, and low-density lipoprotein receptors from egg yolk; and ovalbumin, ovotransferrin, and lysozyme from egg white (128). While some paint binders originate from a single source, identification of mixed sources is also possible with PMF, although LC-MS/MS is better suited for identifying mixtures of unknown composition (27,83). In some cases, the use of dual enzymes during digestion can provide enhanced resolution when using noncollagenous proteins (129).Additional biomaterials that are known to contain proteins with at least some level of taxonomic variability include terrestrial snail shells (130), corals (131), and insect exoskeleton cuticles (132). These biomaterials preserve over archaeological timescales and show promise for future exploration, but they have only been minimally explored at present.
Current Limitations in the Field
While there is an increasing interest in applying PMF—and ZooMS in particular—to investigate zooarchaeological remains, archaeological artifacts, cultural heritage materials, and works of art, there are also significant barriers to the adoption of the technique by new research groups. Below, we highlight solutions to improve data reporting, standardization, and accessibility.First, there is currently no centralized repository of COL1 markers, MALDI-TOF reference spectra, or curated collagen sequences. Instead, each research group maintains their own internal reference datasets that need to be continuously updated with new publications. Additionally, while most markers have been verified by LC-MS/MS in their initial publication, some have not due to insufficient funds, problems of feasibility, or lack of corresponding genetic sequence data (92,97,133). In some cases, provisional markers have been later challenged or redefined once LC-MS/MS data became available (11). Without a centralized information hub, it can be difficult to track these changes, and this poses a significant obstacle for new researchers entering the field. Further complicating the learning process for new researchers is the fact that mass lists are often published with different levels of precision inm/zvalues, and multiple marker naming systems are currently in use (Dataset S1), although there has recently been an attempt at standardization (42).Second, while it is becoming more common for publications to make raw MALDI-TOF data available in public data repositories (11,21,23), the raw data for most early studies are not available, and even recent studies have not always made their data available. This reduces the replicability of ZooMS research and prevents data reanalysis as new markers are identified or existing markers are redefined in light of new evidence. As has been recently proposed for ancient protein studies using LC-MS/MS (69), new standards and guidelines are needed for publishing raw MALDI-TOF data.Third, even when all publications are included, current ZooMS markers are still heavily biased toward large European mammals. Although other taxonomic groups are increasingly being studied, the number of published markers relative to the number of potential species remains low. In addition, because publicly available collagen gene and protein sequences are biased toward mammals (21), ZooMS continues to be available for only a small subset of the taxonomic groups of interest.Fourth, there is a lack of centralized training resources and methodologies. Unlike other biomolecular specialties, the ZooMS community has not yet developed regular workshops for new researchers to learn about the field and its methods. Although mammal identification protocols (https://doi.org/10.17504/protocols.io.bzscp6aw) and detailed bench protocols for several common ZooMS methods (https://doi.org/10.17504/protocols.io.bf5djq26) have been recently published, most noninvasive protocols are only described in the methods sections of publications. There has also been a lack of investment in community software, with the most widely used open source software, mMass (134), no longer being developed or supported (http://www.mmass.org/). Recently, efforts have been made to keep mMass available and compatible with new computer operating systems (through the European Research Council FINDER project;https://github.com/dreamingspires/mMass), but additional community support is needed. In addition, most species identifications, even for high-throughput screening, are not automated. Several different methods of automation are currently being explored (135,136) but still remain in development.Finally, ZooMS is fundamentally a tool to provide taxonomic identification. Currently, most ZooMS work focuses on developing markers, screening for specific species, or answering questions that involve a limited number of species. As ZooMS expands, there will need to be an expanded focus on situating ZooMS identifications within broader zooarchaeological frameworks and incorporating ZooMS data into standard zooarchaeological metrics, such as number of identified specimens, minimum number of individuals, and minimum number of elements (14,22). Because ZooMS can be conducted on fragmentary remains that would otherwise not be counted in these metrics, creating standards for how to report ZooMS results that allow comparison and integration with morphologically identified zooarchaeological datasets will be essential.
Conclusion: ZooMS, the Next Decade
The first decade of ZooMS research has shown the potential for PMF techniques to revolutionize species identification of animal remains. However, to date, research has largely focused on method and marker development by a handful of research groups. Applications, by necessity, have frequently been limited in scope. Nevertheless, even with these limitations, ZooMS has had tremendous success, contributing to the discovery of the first Neanderthal–Denisovan child (137), identifying the earliest known domestic sheep in Africa (23), providing insights into the construction of the medieval York Gospels (138), and demonstrating that bone scraper technology for leatherworking has persisted for over 50,000 years (101).The next major challenge for ZooMS will be to grow from a new method performed by highly specialized laboratories into a widely available and accessible technique with robustly supported software, centralized databases, and public data repositories. The number of research groups using ZooMS has steadily increased over the past decade, and while many of these laboratories are still actively involved in method and/or marker development, an increasing number of research groups are focusing on applied questions. The next decade of ZooMS promises an even greater expansion of applications by an even wider range of researchers, including by citizen science and K–12 educational groups (139). In order for ZooMS to successfully make the transition from a niche method developed by a handful of groups to a well-established technique with revolutionary applications, a community effort needs to be made to fund and establish an open-source marker database, stable data repositories, and training resources. The continued expansion of markers, aided by gene mining from large-scale genome sequencing projects, will allow for ZooMS to be applied more broadly in time and space and to more diverse taxa. Automation of spectral processing will increase the number of spectra that can be analyzed and provide a substantial body of data for zooarchaeologists and ecologists to analyze and interpret as they develop standardized ways to incorporate ZooMS results into established frameworks for faunal analysis and ecological reconstruction.ZooMS will continue to be used as a screening tool for identifying taxa of interest, allowing researchers to find the proverbial needles in the haystack of millions of fragmentary and morphologically unidentifiable remains. Characterization of faunal assemblages will continue, but increasing geographical and temporal data will enable the tracking of ecological changes, the arrival of invasive species, and fluctuations in animal exploitation through space and time. Increasing markers for ecological indicator species, especially among small mammals, fish, and reptiles, will support more comprehensive ecological reconstructions of those ecosystems most vulnerable to climate change, thereby aiding in better understanding the factors that contribute to resilience and recovery. Targeted questions around archaeological artifacts and cultural heritage materials will allow for more nuanced interpretations of human–animal interactions and the cultural and technological choices made by our ancestors. Globally, at present there have been fewer than 50,000 samples analyzed by ZooMS. The next decade will be driven by the following question. What will we be able to achieve if we aim for a million ZooMS samples?