Literature DB >> 36118295

Proteomics dataset from 26th Dynasty Egyptian mummified remains sampled using minimally invasive skin sampling tape strips.

Dylan H Multari¹, Prathiba Ravishankar¹, Geraldine J Sullivan¹, Ronika K Power^2,3,4, Constance Lord⁵, James A Fraser⁶, Paul A Haynes^1,3,4.

Abstract

Paleoproteomics typically involves the destructive sampling of precious bioarchaeological materials. This analysis aims to investigate the proteins identifiable via nanoLC-MS/MS from highly degraded 26th Dynasty Egyptian mummified human remains (NMR.29.1-8) after non-destructive sampling with commercially available dermatology-grade skin sampling tape strips. A collection of cranial and other bone fragments were sampled with the tape strips then subsequently analysed using a shotgun proteomics approach. The number of proteins identified using this method ranged from 18 to 437 at a peptide FDR of <1%. Deamidation ratios were assessed using an in-house R script, with asparagine deamidation averaging ∼20-30% and glutamine deamidation averaging ∼15-25%.

Entities: Chemical

Keywords: Bioarchaeology; Cultural heritage; Deamidation; Mass spectrometry; Paleoproteomics

Year: 2022 PMID： 36118295 PMCID： PMC9478331 DOI： 10.1016/j.dib.2022.108562

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table Value of the Data This dataset highlights the types of proteins that can be identified using minimally invasive sampling techniques on ancient human bone. The data may be of use to researchers working within paleoproteomics and those working in museums/cultural heritage organisations interested in new minimally invasive techniques. The data may be reused to compare with other paleoproteomics studies, in particular reference to the types of proteins identified, and degree of deamidation identified in these samples.

Data Description

This dataset represents data from a shotgun proteomics analysis of modern skin and ancient human bone samples, collected using minimally invasive skin sampling tape [1]. The raw mass spectrometric data, along with a collated Excel workbook of Global Proteome Machine (GPM) protein outputs (filename “S1 Protein Results All Samples.xlsx”) and peptide outputs (filename “S2 Peptide Results All Samples.xlsx”) filtered at a protein expectation score log(e) of -3 have been deposited on the ProteomeXchange Consortium via the PRIDE repository [2] and are accessible using the identifier PXD029003. An Excel spreadsheet (filename “Explanation of raw files and samples.xlsx”) was also uploaded with the raw data to explain which series of raw files correlate to which sample. The Excel spreadsheets of peptide and protein output results are also publicly available a GitHub repository, accessible via https://github.com/dymult/DataInBrief_Deamidation. This repository also contains the deamidation R script used to assess the deamidation ratios of the samples, as well as the input GPM peptide .xlsx files and the output .png files.

Experimental Design, Materials and Methods

Sampling

Modern surrogate samples were collected using a series of one, three, and five D-Squame D100 Sampling tape strips (22 mm diameter; CuDerm Corporation, Texas, USA) from the forearm of a healthy consenting volunteer. Strips were gently applied to the sample surface with uniform pressure for a total of 10 s before removal and immediate storage in a fresh 1.5 mL microcentrifuge tube. Ancient samples were collected from four cranial fragments along with the mandible and humeral diaphysis (hereafter referred to as humeral shaft). Strips were applied in a similar fashion to the modern samples across the endo- and ectocranial surfaces of the four cranial fragments, the buccolabial and lingual surfaces of the mandible (for ease of reporting, these are hereafter referred to as the interior and exterior surfaces, respectively), and along the length of the humeral shaft. A total of three strips were used on each of the cranial fragment and mandible surfaces, while a total of nine strips were used on the humeral shaft. These nine strips were later split into three groups of one, three, and five strips, respectively, to mitigate issues with fitting nine strips into a single microcentrifuge tube.

Protein Extraction

All laboratory work was performed in accordance with established guidelines for ancient protein analysis [3]. Briefly, this involved the use of nitrile gloves, sterilised workspaces and equipment, freshly prepared reagents, mass spectrometric analysis of both analytical (1% formic acid) and procedural blanks (sampling tape strips removed from packaging and immediately transferred to 1.5 mL tube), and the use of a laminar flow hood to minimise potential exogenous contamination. Protein extraction from the sampling tape strips was performed using a modified protocol adapted from Clausen et al. [4]. Each set of strips was transferred to fresh 1.5 mL microcentrifuge tubes and covered with 1 mL of phosphate buffered saline (PBS) prior to bath sonication for 15 min at room temperature. Proteins were then precipitated with ice-cold acetone at -20°C for 2 h and centrifuged at 20,000 x g for 10 min. The supernatant was discarded, and the pellet was briefly air-dried.

SDS-PAGE and Trypsin In-Gel Digestion

Protein pellets were resuspended in 50 µL 2x SDS loading buffer (100 mM Tris-HCl, 4% SDS, 0.2% bromophenol blue, 20% glycerol, 200 mM dithiothreitol), and heated to 95°C for exactly 5 min with agitation. SDS-PAGE separation was performed on a precast gel (Bio-Rad 10% Mini-PROTEAN® TGX™, 10 × 50 µL wells) at 100 V for 1 h. A protein marker standard (10 µL; Bio-Rad Precision Plus Unstained Marker) was also loaded on each gel. Proteins were visualised using colloidal Coomassie blue staining protocols, and sample lanes were excised and fractionated into eight equal fractions for subsequent in-gel digestion. Gel fractions were finely chopped and transferred to fresh 1.5 mL microcentrifuge tubes, and re-equilibrated in 200 µL of 100 mM NH4HCO3 before three destaining washes of 200 µL of 50% acetonitrile (ACN)/50 mM NH4HCO3 for 10 min each, then dehydrated with a wash of 100% ACN. The ACN was removed and gel fractions were rehydrated and reduced in 50 µL of a reducing solution consisting of 10 mM dithiothreitol (DTT) in 100 mM NH4HCO3 for 1 h at room temperature. DTT solution was removed and replaced with an alkylating solution of 55 mM iodoacetamide in 100 mM NH4HCO3 and incubated in the dark for 1 h at ambient temperature. Iodoacetamide solution was decanted, and gel pieces were re-equilibrated with a wash of 100 µL of 100 mM NH4HCO3 for 10 min, then two washes with 200 µL of 50% ACN/50 mM NH4HCO3, prior to dehydration in 100% ACN. In-gel trypsin digestion was performed by rehydrating gel pieces in 30 µL of trypsin solution (10 ng/µL Promega Sequencing Grade Modified Trypsin in 50 mM NH4HCO3) for 30 min at 4°C, then incubating overnight at 37°C. Resultant peptides were extracted in 30 µL of 50% ACN/2% formic acid (FA), followed by two more extractions with 70% ACN/2% FA, and 90% ACN/2% FA, respectively. Peptide extracts were vacuum centrifuged to dryness and reconstituted in 10 µL of 1% FA for mass spectrometric analysis via nanoflow liquid chromatography – tandem mass spectrometry (nanoLC-MS/MS).

NanoLC-MS/MS of Extracted Peptides

A shotgun proteomics workflow was applied with minor modification, as previously described [1,5,6]. Briefly, peptides were analysed using a Thermo Q-Exactive orbitrap mass spectrometer coupled to a Thermo Easy-nLC1000 system (Thermo Scientific, San Jose, CA). A reversed-phase chromatographic separation was performed using a C18 HALO column (2.7 µm bead size, 160 Å pore size; 75 µm internal diameter x 75 mm length) and employed a 60 min linear gradient of 1–50% solvent A (0.1% FA) to solvent B (99.9% ACN/0.1% FA). Data-dependent acquisition mode was configured to automatically switch from Orbitrap MS to MS/MS mode. Peptide spectra were acquired within a m/z range of 350–1600 amu at a resolution of 35,000 with an isolation window of 3.0 m/z. Higher energy collisional dissociation (HCD) fragmentation was performed at 30% normalised HCD collision energy on the top ten most abundant ions. Dynamic exclusion of target ions was set for 20 s, and fragment ions were analysed in the orbitrap at a resolution of 17,500. Analytical blanks were analysed between samples to minimise system carryover of abundant peptides.

Peptide-to-Spectrum Using X! Tandem Algorithm

Raw spectral files were converted to mzXML format using the MSConvert graphical user interface [7,8] for peptide-to-spectrum matching (PSM) using the X! Tandem algorithm operating under the GPM interface software (version 3.0, https://www.thegpm.org/) [9,10]. Files were searched against the curated SwissProt Human Protein database (downloaded January 2019; 20,329 proteins) and the Common Repository of Adventitious Proteins (cRAP) database containing sequence data for common laboratory contaminants and protein standards [11]. PSM was performed with the following specifications: Orbitrap method including ±20 ppm parent ion mass tolerance, ±0.1 Da fragment ion mass tolerance, trypsin as digestion enzyme allowing for a maximum of two missed cleavages, peptide search length of between 6 and 50 amino acids with allowance for up to 4+ charge states, carbamidomethylation of cysteine/selenocysteine as a fixed modification, and oxidation of methionine and tryptophan, hydroxylation of proline, and deamidation of asparagine and glutamine as potential variable modifications. PSM was also conducted against a reversed database for assessment of false discovery rates (FDR). Each of the eight sample fractions were processed sequentially then merged to generate a collated non-redundant sample output file. The ‘HumeralShaft_total’ data represents all 24 gel fractions compiled into a single GPM search. GPM protein and peptide output files were exported from the graphical user interface as .xlsx files. Data were manually filtered at a protein expectation score log(e) cut-off value of -3, including retention of high-quality single peptide-based protein assignments, to achieve a peptide-level FDR of <1%. Details of the filtered protein and peptide-level data can be found in the Supplementary Data of the original publication [1] and on GitHub (https://github.com/dymult/DataInBrief_Deamidation).

Assessment of Deamidation Ratios

The deamidation ratios of the ancient and modern samples were assessed using the R coding language (version 3.6.3) [12] operating in the RStudio integrated development environment (version 1.3.1073). The R script, input data files, and graphical outputs can be found in Supplementary Data S1 and on GitHub (https://github.com/dymult/DataInBrief_Deamidation). An explanation of how the R script works is as follows: Empty data frames called deamtable, ker and nker were created to store deamidation ratios overall and within protein groups (In this instance: keratins and non-keratin). All Excel files within the current directory are read in using readxl::read_excel and put into the variable files. Sample names are read from the Excel file names, removing the .xlsx extension. Each file is run through a loop to calculate and collate the deamidation ratios. Four columns are extracted from the file and stored in input: peptide sequence (sequence), observed modifications (modifications), protein accession ID (protein), protein description (description). Contam contains a list of common contaminants (proteins identified from the cRAP database) and reversed identifications which are removed in the next command using grepl and stored in filt. The GPM modifications column contains strings separated by semicolons, and we use strsplit with a ‘;’ separator and unlist to create a large character vector (mod) of modifications. We subset this character string into deam using grepl searching for “deamidated” to find all instances of deamidation within the peptide sequences. The number of deamidated glutamine and asparagine residues are counted with grepl and stored in variables eventQ and eventN respectively. The total number of glutamine and asparagine residues within the peptide sequences column are counted with stringr::str_count and stored in numQ and numN, respectively. The ratios of deamidated glutamine and asapargine residues are calculated and stored in ratioQ and ratioN, respectively. For each file, a row is created in deamtable with the respective file name, and deamidation ratios for glutamine and asparagine. Within the same loop, the files are separated into keratins (ker) and non-keratins (notker) using grepl. Both sets of data are listed into dataframes. A second loop is run over the two data frames. Within this loop, deamidation ratios for each data set are calculated and collated, separately from the overall deamidation ratios using the same method as described above. For each data set, a row is created in ker and nker with the file name, and deamidation ratios for glutamine and asparagine. The keratin and non-keratin glutamine and asparagine deamidation ratios per file are stored in pgdeam. The per-file loop finishes here. The deamtable dataframe with the overall deamidation ratios was converted from wide to long format using reshape2::melt, with sample name as ID variable and ratio as value name. The ratio column was converted to a numeric variable and the second column was renamed ‘amino_acid’. Two descriptive columns called ‘age’ and ‘residue’ were generated to differentiate between ancient and modern samples and glutamine and asparagine residues using grepl. A descriptive column called ‘sampleID’ was generated to simplify the plotting labels for the X axis. The pgdeam dataframe with the protein group deamidation ratios was converted from wide to long format as described above (pgdeammelt), and the ‘amino_acid’, ‘age’, ‘residue’ and ‘sampleID’ columns were added in. Additionally, a descriptive column called ‘protein’ was generated using grepl to differentiate between keratin and non-keratin ratios. The samples in this study were the sampled interior and exterior surfaces of various cranial and other bones. To assess a sample-specific deamidation value, the data for both surfaces of each bone sample and all three modern samples were respectively averaged as follows: A summary function (summary_func) was written to calculate the mean and standard deviation within the samples to produce an overall picture of the deamidation of non-keratin proteins. The keratin proteins were filtered out of pgdeammelt using subset to create nkdeammelt. The summary_func function was applied over nkdeammelt using plyr::ddply with columns ‘sampleID’ and ‘residue’ as the groupings and stored in avdeammelt.

Data Visualisation

A PNG wrapped gglot2 grouped bar chart was created using the deammelt data with X axis ‘sample_name’, Y axis 1-‘ratio’ (percentage of non-deamidated residues) and coloured by residue. The chart was faceted by the ‘age’ column. The output file was called ‘bulkdeam.png’. A PNG wrapped ggplot2 grouped bar chart was created using the pgdeammelt data with X axis ‘sample_name’, Y axis 1-‘ratio’ (percentage of non-deamidated residues) and coloured by residue. The chart was faceted by both the ‘protein’ and ‘age’ columns. The output file was called ‘pgdeam.png’. A PNG wrapped ggplot2 grouped bar chart was created using the avdeammelt data with X axis ‘sampleID’, Y axis 1-‘mean’ (average percentage of non-deamidated residues) and coloured by residue. Error bars of one standard deviation were used and plotted using ggplot2::geom_errorbar. The output file was called ‘avdeamnonkeratin.png’. Examples of these graphical outputs can be found in the Supplementary Data of the original publication [1], and also at the GitHub repository referred to above.

Ethics Statements

Ethics clearance was not required for the ancient samples analysed in this study as the remains were known to be several thousand years old and no living relatives are known. A letter of exemption was issued by the Macquarie University Ethics Committee (Reference No. 5201849176758). Modern skin samples were by DHM from his own forearm and are exempt from ethics approval on the grounds of self-experimentation.

CRediT authorship contribution statement

Dylan H. Multari: Conceptualization, Methodology, Investigation, Formal analysis, Data curation, Writing – original draft, Visualization. Prathiba Ravishankar: Conceptualization, Writing – review & editing. Geraldine J. Sullivan: Software, Formal analysis, Visualization, Writing – original draft. Ronika K. Power: Supervision, Writing – review & editing. Constance Lord: Resources, Writing – review & editing. James A. Fraser: Resources, Writing – review & editing. Paul A. Haynes: Supervision, Conceptualization, Resources, Writing – review & editing, Project administration.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Subject	Omics: Proteomics
Specific subject area	Modern human skin surrogate and ancient human bone surfaces sampled with dermatology-grade skin sampling strips.
Type of data	Table, Figure, Graph
How the data were acquired	nanoLC-MS/MS, bottom-up proteomics.Thermo Easy-nLC1000 system coupled to a Thermo Q-Exactive orbitrap mass spectrometer (Thermo Scientific, San Jose, CA).HALO C18 160 Å, 2.7 µm bead size, 75 µm internal diameter x 75 mm length column. A 60 min reversed-phase chromatographic linear gradient of 1–50% solvent A (0.1% formic acid) to solvent B (99.9% acetonitrile/0.1% formic acid).X! Tandem search algorithm running under Global Proteome Machine (version 3.0) software searching the SwissProt Human Protein database (20,329 proteins).
Data format	Raw, Analysed, Filtered
Description of data collection	A modern human skin surrogate, and the interior and exterior surfaces of ancient human bone fragments were sampled with D-Squame skin sampling tape strips. Sampling was performed with 3 strips for each bone fragment surface except humeral shaft and modern skin which were sampled with a total of 9 strips (later divided into groups of 1, 3, and 5).
Data source location	Institution: Macquarie UniversityCity/Town/Region: SydneyCountry: Australia
Data accessibility	Data is accessible from the ProteomeXchange Consortium via the PRIDE partner repository (dataset identifier PXD029003). Additional data is available with the article.Project Webpage: http://www.ebi.ac.uk/pride/archive/projects/PXD029003FTP Download:ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2022/04/PXD029003GitHub Link:The R script used in this study along with the collated GPM protein and peptide outputs are publicly available on GitHub using the following link:https://github.com/dymult/DataInBrief_Deamidation
Related research article	Multari, D.H., Ravishankar, P., Sullivan, G.J., Power, R.K., Lord, C.,Fraser, J.A., Haynes, P.A., 2022. Development of a novel minimally invasive sampling and analysis technique using skin sampling tape strips for bioarchaeological proteomics, Journal of Archaeological Science 139, 105548. 10.1016/j.jas.2022.105548.

10 in total

1. TANDEM: matching proteins with tandem mass spectra.

Authors: Robertson Craig; Ronald C Beavis
Journal: Bioinformatics Date: 2004-02-19 Impact factor: 6.937

2. Open source system for analyzing, validating, and storing protein identification data.

Authors: Robertson Craig; John P Cortens; Ronald C Beavis
Journal: J Proteome Res Date: 2004 Nov-Dec Impact factor: 4.466

3. Quantitative proteomics of heavy metal stress responses in Sydney rock oysters.

Authors: Sridevi Muralidharan; Emma Thompson; David Raftos; Gavin Birch; Paul A Haynes
Journal: Proteomics Date: 2012-03 Impact factor: 3.984

4. Common Repository of FBS Proteins (cRFP) To Be Added to a Search Database for Mass Spectrometric Analysis of Cell Secretome.

Authors: Jihye Shin; Yumi Kwon; Seonjeong Lee; Seungjin Na; Eun Young Hong; Shinyeong Ju; Hyun-Gyo Jung; Prashant Kaushal; Sungho Shin; Ji Hyun Back; Seon Young Choi; Eun Hee Kim; Su Jin Lee; Yae Eun Park; Hee-Sung Ahn; Younghee Ahn; Mohammad Humayun Kabir; Seong-Jun Park; Won Suk Yang; Jeonghun Yeom; Oh Young Bang; Chul-Won Ha; Jin-Won Lee; Un-Beom Kang; Hye-Jung Kim; Kang-Sik Park; J Eugene Lee; Ji Eun Lee; Jin Young Kim; Kwang Pyo Kim; Youngsoo Kim; Hisashi Hirano; Eugene C Yi; Je-Yoel Cho; Eunok Paek; Cheolju Lee
Journal: J Proteome Res Date: 2019-09-10 Impact factor: 4.466

Review 5. A guide to ancient protein studies.

Authors: Jessica Hendy; Frido Welker; Beatrice Demarchi; Camilla Speller; Christina Warinner; Matthew J Collins
Journal: Nat Ecol Evol Date: 2018-03-26 Impact factor: 15.460

6. ProteoWizard: open source software for rapid proteomics tools development.

Authors: Darren Kessner; Matt Chambers; Robert Burke; David Agus; Parag Mallick
Journal: Bioinformatics Date: 2008-07-07 Impact factor: 6.937

7. Manipulating root water supply elicits major shifts in the shoot proteome.

Authors: Mehdi Mirzaei; Neda Soltani; Elham Sarhadi; Iniga S George; Karlie A Neilson; Dana Pascovici; Shila Shahbazian; Paul A Haynes; Brian J Atwell; Ghasem Hosseini Salekdeh
Journal: J Proteome Res Date: 2013-12-03 Impact factor: 4.466

8. A cross-platform toolkit for mass spectrometry and proteomics.

Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908

9. The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Authors: Yasset Perez-Riverol; Attila Csordas; Jingwen Bai; Manuel Bernal-Llinares; Suresh Hewapathirana; Deepti J Kundu; Avinash Inuganti; Johannes Griss; Gerhard Mayer; Martin Eisenacher; Enrique Pérez; Julian Uszkoreit; Julianus Pfeuffer; Timo Sachsenberg; Sule Yilmaz; Shivani Tiwary; Jürgen Cox; Enrique Audain; Mathias Walzer; Andrew F Jarnuczak; Tobias Ternent; Alvis Brazma; Juan Antonio Vizcaíno
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

10. Tape Stripping Technique for Stratum Corneum Protein Analysis.

Authors: Maja-Lisa Clausen; H-C Slotved; Karen A Krogfelt; Tove Agner
Journal: Sci Rep Date: 2016-01-28 Impact factor: 4.379

10 in total