Alison M Devault1,2, Tatum D Mortimer3,4, Andrew Kitchen5, Henrike Kiesewetter6, Jacob M Enk1,2, G Brian Golding7, John Southon8, Melanie Kuch1, Ana T Duggan1, William Aylward9,10, Shea N Gardner11, Jonathan E Allen11, Andrew M King12, Gerard Wright12, Makoto Kuroda13, Kengo Kato13, Derek Eg Briggs14, Gino Fornaciari15, Edward C Holmes16, Hendrik N Poinar1,7,12,17, Caitlin S Pepperell3,9,18. 1. McMaster Ancient DNA Centre, Department of Anthropology, McMaster University, Hamilton, Canada. 2. MYcroarray, Ann Arbor, United States. 3. Department of Medical Microbiology and Immunology, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, United States. 4. Microbiology Doctoral Training Program, University of Wisconsin-Madison, Madison, United States. 5. Department of Anthropology, University of Iowa, Iowa City, United States. 6. Project Troia, Institute of Prehistory, Early History, and Medieval Archaeology, Tübingen University, Tübingen, Germany. 7. Department of Biology, McMaster University, Hamilton, Canada. 8. Keck Carbon Cycle Accelerator Mass Spectrometer, Earth Systems Science Department, University of California, Irvine, United States. 9. Molecular Archaeology Laboratory, Biotechnology Center, University of Wisconsin-Madison, Madison, United States. 10. Department of Classics and Ancient Near Eastern Studies, University of Wisconsin-Madison, Madison, United States. 11. Lawrence Livermore National Laboratory, Livermore, United States. 12. Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Canada. 13. Laboratory of Bacterial Genomics, Pathogen Genomics Center, National Institute of Infectious Diseases, Tokyo, Japan. 14. Department of Geology and Geophysics, Yale University, New Haven, United States. 15. Division of Paleopathology, Department of Translational Research on New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy. 16. Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, Australia. 17. Humans and the Microbiome Program, Canadian Institute for Advanced Research, Toronto, Canada. 18. Department of Medicine (Infectious Diseases), School of Medicine and Public Health, University of Wisconsin-Madison, Madison, United States.
Abstract
Pregnancy complications are poorly represented in the archeological record, despite their importance in contemporary and ancient societies. While excavating a Byzantine cemetery in Troy, we discovered calcified abscesses among a woman's remains. Scanning electron microscopy of the tissue revealed 'ghost cells', resulting from dystrophic calcification, which preserved ancient maternal, fetal and bacterial DNA of a severe infection, likely chorioamnionitis. Gardnerella vaginalis and Staphylococcus saprophyticus dominated the abscesses. Phylogenomic analyses of ancient, historical, and contemporary data showed that G. vaginalis Troy fell within contemporary genetic diversity, whereas S. saprophyticus Troy belongs to a lineage that does not appear to be commonly associated with human disease today. We speculate that the ecology of S. saprophyticus infection may have differed in the ancient world as a result of close contacts between humans and domesticated animals. These results highlight the complex and dynamic interactions with our microbial milieu that underlie severe maternal infections.
Pregnancy complications are poorly represented in the archeological record, despite their importance in contemporary and ancient societies. While excavating a Byzantine cemetery in Troy, we discovered calcified abscesses among a n class="Species">woman's remains. Scanning electron microscopy of the tissue revealed 'ghost cells', resulting from dystrophic calcification, which preserved ancient maternal, fetal and bacterial DNA of a severe infection, likely chorioamnionitis. Gardnerella vaginalis and Staphylococcus saprophyticus dominated the abscesses. Phylogenomic analyses of ancient, historical, and contemporary data showed that G. vaginalisTroy fell within contemporary genetic diversity, whereas S. saprophyticusTroy belongs to a lineage that does not appear to be commonly associated with human disease today. We speculate that the ecology of S. saprophyticusinfection may have differed in the ancient world as a result of close contacts between humans and domesticated animals. These results highlight the complex and dynamic interactions with our microbial milieu that underlie severe maternal infections.
During excavations of a Late Byzantine era cemetery at the periphery of the ancient city of Troy, Anatolia (in present day n class="Species">Turkey) (Figure 1—figure supplement 1), we discovered two calcified nodules among a woman’s remains. The woman was estimated to be 30 (±5y) at the time of death (Appendix). She was found alone in a stone-lined grave (Figure 1A) within the graveyard of a farming community (Kiesewetter, 2014). The nodules, which are 2–3 cm in diameter and composed of concentric layers (Figure 1B), were discovered at the base of the ribs. Radiocarbon dating of the decedent’s ulna yielded 790-860y BP (Supplementary file 1A), in agreement with the archeological assessment of the age of the cemetery (early 13th century AD, Appendix).
Figure 1—figure supplement 1.
Map of Troy showing the cemetery in Grid Square x24 and areas of excavation 1988–2012.
Areas of excavation are in gray and the cemetery is marked with a red square. North is at the top of the plan.
DOI:
http://dx.doi.org/10.7554/eLife.20983.004
Figure 1.
Calcified nodule found among the skeletal remains at Troy.
(A) Burial x24.177 (grave 14, cemetery in quadrat x24). Photo credit Gebhard Bieg, 2005. (B) Cross-section of nodule (sample no x24.177), photo credit: Pathologie Nordhessen 2009. Scale represents 1 cm. (C) Location of Troy. Modern day Turkey is shaded in gray.
DOI:
http://dx.doi.org/10.7554/eLife.20983.003
Areas of excavation are in gray and the cemetery is marked with a red square. North is at the top of the plan.
DOI:
http://dx.doi.org/10.7554/eLife.20983.004
(A) Nodule one (Nod1_1h-UDG), 28,713,282 reads total (B) Nodule two (Nod2-UDG), 6,038,994 reads total.
DOI:
http://dx.doi.org/10.7554/eLife.20983.005
These FLDs were generated from the Ulna enriched libraries (A + B) as well as the non enriched nodule (C) using mapDamage2 (Jonsson et al., 2013) from merged nonUDG data sets assembled to the human mitochondrial reference genome (Andrews et al., 1999), NCBI accession NC_012920.
DOI:
http://dx.doi.org/10.7554/eLife.20983.006
Damage profiles of non-UDG treated (‘nonU’) as well as UDG treated merged reads assembled to the human mitochondrial rCRS reference genome (NC_012920) for (A) Ulna_Enr1-nonU round one human mitochondrial enrichment, (B) Ulna_Enr2-nonU round 2, and (C) a Nod1_1h-UDG reads (which have been UDG treated). Damage profiles were generated using mapDamage2.(Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.007
Damage profiles generated using mapDamage2 (Jonsson et al., 2013) of non-UDG treated (‘nonU’) reads from the NOD1_nonU and NOD2_nonU data set (total of 1,468,381 trimmed and merged reads) with minimum 35 bp length and map quality 30, mapping to (A) hg38 chrX, and (B) hg38 chrY and C) hg38 autosomes.
DOI:
http://dx.doi.org/10.7554/eLife.20983.008
Complete human mtDNA genomes assigned to haplogroup U3 (n = 137) were collected from GenBank and aligned with the Troy consensus sequence (highlighted in red). Tree was generated using BEAST v 1.856 and TreeAnnotator.57 Posterior probabilities are shown at nodes.
DOI:
http://dx.doi.org/10.7554/eLife.20983.009
The heatmap gives the log of the frequency of the most common taxa in each sample along the diagonal (if the most frequent is already shown, then second most frequent is added for that sample; Nares and Ear, Nod1_1h-UDG and Nod2-UDG). The taxa in order are - 76775, Malassezia restricta; 76773, Malassezia globosa; 729, Haemophilus parainfluenzae; 60133, Prevotella pallens; 28117, Alistipes putredinis; 47770, Lactobacillus crispatus; 562, Escherichia coli; 487, Neisseria meningitidis; 2001, Streptosporangium roseum; 29385, Staphylococcus saprophyticus; 2702, Gardnerella vaginalis.
DOI:
http://dx.doi.org/10.7554/eLife.20983.010
Taxa were identified using LMAT. PCA performed using prcomp function in R. Legend indicates the origin of the category and the number of samples combined into each category. The first principal component axis separates the placental and ancient samples from the remaining samples. The second principal component axis separates the Sediment-UDG and Ulna-UDG data sets, which likely contain soil contamination, from the remaining samples.
DOI:
http://dx.doi.org/10.7554/eLife.20983.011
DOI:
http://dx.doi.org/10.7554/eLife.20983.012
Calcified nodule found among the skeletal remains at Troy.
(A) Burial x24.177 (grave 14, cemetery in quadrat x24). Photo credit Gebhard Bieg, 2005. (B) Cross-section of nodule (sample no x24.177), photo credit: Pathologie Nordhessen 2009. Scale represents 1 cm. (C) Location of Troy. Modern day Turkey is shaded in gray.DOI:
http://dx.doi.org/10.7554/eLife.20983.003
Map of Troy showing the cemetery in Grid Square x24 and areas of excavation 1988–2012.
Areas of excavation are in gray and the cemetery is marked with a red square. North is at the top of the plan.DOI:
http://dx.doi.org/10.7554/eLife.20983.004
Metagenomic profiles of shotgun DNA libraries from nodules, based on BLAST analysis of all reads >35 bp length.
(A) Nodule one (Nod1_1h-UDG), 28,713,282 reads total (B) Nodule two (Nod2-UDG), 6,038,994 reads total.DOI:
http://dx.doi.org/10.7554/eLife.20983.005
Fragment length distributions for non-UDG treated human mitochondrial assemblies.
These FLDs were generated from the Ulna enriched libraries (A + B) as well as the non enriched nodule (C) using mapDamage2 (Jonsson et al., 2013) from merged nonUDG data sets assembled to the n class="Species">human mitochondrial reference genome (Andrews et al., 1999), NCBI accession NC_012920.
DOI:
http://dx.doi.org/10.7554/eLife.20983.006
Ancient DNA damage assessment of human mitochondrial reads.
Damage profiles of non-UDG treated (‘nonU’) as well as n class="Gene">UDG treated merged reads assembled to the human mitochondrial rCRS reference genome (NC_012920) for (A) Ulna_Enr1-nonU round one human mitochondrial enrichment, (B) Ulna_Enr2-nonU round 2, and (C) a Nod1_1h-UDG reads (which have been UDG treated). Damage profiles were generated using mapDamage2.(Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.007
Ancient DNA damage assessment of reads mapped to hg38 chrX, chrY and autosomes.
Damage profiles generated using mapDamage2 (Jonsson et al., 2013) of non-UDG treated (‘nonU’) ren class="Disease">ads from the NOD1_nonU and NOD2_nonU data set (total of 1,468,381 trimmed and merged reads) with minimum 35 bp length and map quality 30, mapping to (A) hg38 chrX, and (B) hg38 chrY and C) hg38 autosomes.
DOI:
http://dx.doi.org/10.7554/eLife.20983.008
Haplogroup U3 Bayesian Maximum Clade Credibility tree.
Complete human mtDn class="Chemical">NA genomes assigned to haplogroup U3 (n = 137) were collected from GenBank and aligned with the Troy consensus sequence (highlighted in red). Tree was generated using BEAST v 1.856 and TreeAnnotator.57 Posterior probabilities are shown at nodes.
DOI:
http://dx.doi.org/10.7554/eLife.20983.009
Heatmap of most common taxa in metagenomic samples.
The heatmap gives the log of the frequency of the most common taxa in each sample along the diagonal (if the most frequent is already shown, then second most frequent is n class="Disease">added for that sample; Nares and Ear, Nod1_1h-UDG and Nod2-UDG). The taxa in order are - 76775, Malassezia restricta; 76773, Malassezia globosa; 729, Haemophilus parainfluenzae; 60133, Prevotella pallens; 28117, Alistipes putredinis; 47770, Lactobacillus crispatus; 562, Escherichia coli; 487, Neisseria meningitidis; 2001, Streptosporangium roseum; 29385, Staphylococcus saprophyticus; 2702, Gardnerella vaginalis.
DOI:
http://dx.doi.org/10.7554/eLife.20983.010
PCA of Human Microbiome Project and ancient metagenomic taxa.
Taxa were identified using LMAT. PCA performed using prcomp function in R. Legend indicates the origin of the category and the number of samples combined into each category. The first principal component axis separates the placental and ancient samples from the remaining samples. The second principal component axis separates the Sediment-n class="Gene">UDG and Ulna-UDG data sets, which likely contain soil contamination, from the remaining samples.
DOI:
http://dx.doi.org/10.7554/eLife.20983.011
Sketch of skeletal preservation.
DOI:
http://dx.doi.org/10.7554/eLife.20983.012Nodule one (Figure 2—figure supplen class="Species">ment 1, Supplementary file 1B) is primarily composed of two phosphate phases, hydroxylapatite (bioapatite) and whitlockite (as well as small amounts of calcite), both of which have been found in human calcified pathological concretions (Lagier and Baud, 2003). Based on their size and concentric layered structure, the nodules could be urinary stones. However, struvite (magnesium ammonium phosphate) and calcium oxalate, common constituents of urinary stones, were absent in both XRD and SEM-EDS analyses (Supplementary file 1B-D). SEM of the nodules (Figure 2, Figure 2—figure supplement 2) revealed aggregates of spherical structures with dimensions typical of bacterial cells, as well as extracellular polymeric substances (EPS – a glycocalyx secreted by the cells during biofilm formation [Decho and Thiel, 2011]).
Figure 2—figure supplement 1.
XRD analysis of nodule.
A) Video alignment and XRD frames. Left; crosshairs indicate the center of the region examined. Right; the four frames collected to obtain a 2θ range of 8–103°. At a low angle, air scatter from the main beam is evident. B) Background subtracted powder pattern. C) Search/match results.
DOI:
http://dx.doi.org/10.7554/eLife.20983.014
Figure 2.
SEM image of nodule at (A) 2000x and (B) 20,000x magnification.
Bacterial cells indicated with red arrow are between ~1 µm and 2 µm (within range expected for Staphylococcus). Extracellular polymeric substances (EPS) are indicated by yellow arrows.
DOI:
http://dx.doi.org/10.7554/eLife.20983.013
A) Video alignment and XRD frames. Left; crosshairs indicate the center of the region examined. Right; the four frames collected to obtain a 2θ range of 8–103°. At a low angle, air scatter from the main beam is evident. B) Background subtracted powder pattern. C) Search/match results.
DOI:
http://dx.doi.org/10.7554/eLife.20983.014
Possible inflammatory (neutrophils) cells indicated by blue arrows.
DOI:
http://dx.doi.org/10.7554/eLife.20983.015
Figure 2—figure supplement 2.
SEM image of nodule at 10,000x magnification.
Possible inflammatory (neutrophils) cells indicated by blue arrows.
DOI:
http://dx.doi.org/10.7554/eLife.20983.015
SEM image of nodule at (A) 2000x and (B) 20,000x magnification.
Bacterial cells indicated with red arrow are between ~1 µm and 2 µm (within range expected for Staphylococcus). Extracellular polymeric substances (EPS) are indicated by yellow arrows.DOI:
http://dx.doi.org/10.7554/eLife.20983.013
XRD analysis of nodule.
A) Video alignment and XRD frames. Left; crosshairs indicate the center of the region examined. Right; the four frames collected to obtain a 2θ range of 8–103°. At a low angle, air scatter from the main beam is evident. B) Background subtracted powder pattern. C) Search/match results.DOI:
http://dx.doi.org/10.7554/eLife.20983.014
SEM image of nodule at 10,000x magnification.
Possible inflammatory (neutrophils) cells indicated by blue arrows.DOI:
http://dx.doi.org/10.7554/eLife.20983.015We extracted DNA from both nodules and mn class="Disease">ade Uracil DNA Glycosylase (UDG) and non-UDG treated dsDNA libraries. Shotgun sequences from all libraries yielded astonishingly high proportions of endogenous human and bacterial DNA: 24–48% human, 37–66% S. saprophyticus, and 5–7% G. vaginalis (Figure 1—figure supplements 2–5).
Figure 1—figure supplement 2.
Metagenomic profiles of shotgun DNA libraries from nodules, based on BLAST analysis of all reads >35 bp length.
(A) Nodule one (Nod1_1h-UDG), 28,713,282 reads total (B) Nodule two (Nod2-UDG), 6,038,994 reads total.
DOI:
http://dx.doi.org/10.7554/eLife.20983.005
Figure 1—figure supplement 5.
Ancient DNA damage assessment of reads mapped to hg38 chrX, chrY and autosomes.
Damage profiles generated using mapDamage2 (Jonsson et al., 2013) of non-UDG treated (‘nonU’) reads from the NOD1_nonU and NOD2_nonU data set (total of 1,468,381 trimmed and merged reads) with minimum 35 bp length and map quality 30, mapping to (A) hg38 chrX, and (B) hg38 chrY and C) hg38 autosomes.
DOI:
http://dx.doi.org/10.7554/eLife.20983.008
From these data, we reconstructed a human mitochondrial genome at 30.1x unique ren class="Disease">ad depth, the consensus of which belongs to haplotype U3b3. In phylogenetic analyses of the mitogenome from Troy and modern mitogenomes, the Troy sample groups most closely with those from the Caucasus and Middle East, both of which were within the eastern limits of Late Byzantine influence (Figure 1—figure supplement 6).
Figure 1—figure supplement 6.
Haplogroup U3 Bayesian Maximum Clade Credibility tree.
Complete human mtDNA genomes assigned to haplogroup U3 (n = 137) were collected from GenBank and aligned with the Troy consensus sequence (highlighted in red). Tree was generated using BEAST v 1.856 and TreeAnnotator.57 Posterior probabilities are shown at nodes.
DOI:
http://dx.doi.org/10.7554/eLife.20983.009
To investigate whether the nodules belonged to the associated female individual, we extracted DNA from her ulna, constructed a dsDn class="Chemical">NA library, enriched for, sequenced, and reconstructed the mitogenome to an average unique coverage depth of 30.8x. The nodule and the ulna share the identical mitochondrial haplotype (Supplementary file 1E), indicating that they stem from the same individual or a maternal relative.
The metagenomic profile of the nodules suggests they derive from an amalgam of human and bacterial cells, as in an abscess. The high concentration of n class="Species">S. saprophyticus and G. vaginalis DNA suggests an origin in genitourinary tissue. To exclude an exogenous environmental source of the bacterial DNA and to further investigate the tissue of origin, we performed metagenomic profiling of the nodules, ulna and sediment from the gravesite. The similarity in abundance of G. vaginalis in the nodules and modern Human Microbiome Panel (HMP) vaginal samples (Figure 1—figure supplement 7) points to a likely origin for the nodules in the female reproductive tract. The metagenomic profile of the nodules (minus their associated blanks) is distinct from the sediment, whereas the reads from the ulna group closely with the sediment sample (Figure 1—figure supplement 8). These results indicate that the nodules were less prone to leaching of environmental DNA. Our SEM-EDS and XRD findings suggest that bacterial and inflammatory cells were replicated in calcium phosphate minerals (‘ghost cells’); it is likely that this mineralization provided a remarkable degree of protection from DNA degradation and environmental leaching as seen in the bones. The ectopic, inflammation-related calcification observed here is an apparently highly effective mechanism of bacterial fossilization that rivals mineralization occurring at much slower rates in the environment.
Figure 1—figure supplement 7.
Heatmap of most common taxa in metagenomic samples.
The heatmap gives the log of the frequency of the most common taxa in each sample along the diagonal (if the most frequent is already shown, then second most frequent is added for that sample; Nares and Ear, Nod1_1h-UDG and Nod2-UDG). The taxa in order are - 76775, Malassezia restricta; 76773, Malassezia globosa; 729, Haemophilus parainfluenzae; 60133, Prevotella pallens; 28117, Alistipes putredinis; 47770, Lactobacillus crispatus; 562, Escherichia coli; 487, Neisseria meningitidis; 2001, Streptosporangium roseum; 29385, Staphylococcus saprophyticus; 2702, Gardnerella vaginalis.
DOI:
http://dx.doi.org/10.7554/eLife.20983.010
Figure 1—figure supplement 8.
PCA of Human Microbiome Project and ancient metagenomic taxa.
Taxa were identified using LMAT. PCA performed using prcomp function in R. Legend indicates the origin of the category and the number of samples combined into each category. The first principal component axis separates the placental and ancient samples from the remaining samples. The second principal component axis separates the Sediment-UDG and Ulna-UDG data sets, which likely contain soil contamination, from the remaining samples.
DOI:
http://dx.doi.org/10.7554/eLife.20983.011
Sexing analyses of the remains (and associated blanks) using the method of Skoglund et al. (2013) assign the nodules as female -XX (Supplementary file 1F). More thorough analyses of the n class="Species">human DNA present in the nodules yielded an intriguing finding that helps pinpoint their tissue of origin. Shotgun sequencing data from the nodules contained a small number of reads (884) conservatively mapping to the Y chromosome (Supplementary file 1G). The length distributions of the reads overlapped with those mapping to the X chromosome and autosomes, suggesting an endogenous, ancient origin (Supplementary file 1G, Figure 1—figure supplement 5); we searched for but did not find similar bona fide ancient Y chromosome reads in sequence data from the ulna, the sediment, or any negative control (Supplementary file 1G). The presence of Y chromosome reads in the nodule but not in the ulna could be explained by a placental origin of the mineralized abscesses, indicating chorioamnionitis in the decedent while pregnant with a male fetus. Chorioamnionitis – inflammation and infection of the placenta and fetal membranes – involves an inflammatory response on the part of the fetus as well as the mother (Kraus et al., 2004), which would explain a female (maternal) origin of the nodular tissue with a minority male (fetal) component.
Chorioamnionitis is a mixed n class="Disease">infection in which vaginal bacteria reach the upper reproductive tract, placenta, and fetal membranes; G. vaginalis is often identified in infected tissues (Hillier et al., 1988). S. saprophyticus can be found among the genitourinary and gastrointestinal flora of healthy women (Ringertz and Torssander, 1986; Rupp et al., 1992; Schneider and Riley, 1996). It is a common cause of urinary tract infection (UTI) in reproductive aged women (Kahlmeter, 2003) and has also been known to cause puerperal infections (Arianpour et al., 2009).
To gain further insights into the pathogens associated with this historical genitourinary infection, we pooled ren class="Disease">ads from all nodule DNA libraries, mapped and reconstructed the ancient S. saprophyticus and G. vaginalis genomes and analyzed them in conjunction with existing and newly acquired genomic data from extant and historical organisms (Supplementary file 1H,I).
We used a combination of paired-end reference guided assembly and iterative assembly to reconstruct a nearly complete genome of S. saprophyticusn class="Gene">Troy, including >100 Kb of novel sequence compared to reference strain ATCC 15305. The genome is 2,471,881 bp long, with an average unique coverage depth of 298.6x (Figure 4—figure supplement 3), which represents an unprecedented, detailed and complete picture of an ancient pathogen genome from shotgun sequencing data. We also reconstructed a 22.6 Kb plasmid, pSST1.
Figure 4—figure supplement 3.
Genome coverage plots for pooled nodule shotgun libraries.
S. saprophyticus (NC_007350), average coverage 298.6X. All reads were restricted to minimum length of 35 bp and minimum map quality 30 with all duplicates removed. Figures depict coverage of the genome in 100 bp blocks across references. Concentric grey circles demarcate increments of 50X coverage in both plots.
DOI:
http://dx.doi.org/10.7554/eLife.20983.033
We were unable to reconstruct a contiguous G. vaginalis genome due to high variability in coverage and lack of synteny in both ancient and modern genomic data (Ahmed et al., 2012). Insten class="Disease">ad, we used a de novo approach to reconstruct G. vaginalisTroy gene content using reads that mapped to the annotated coding regions of all available G. vaginalis genomes. This enabled us to assess the gene content of our ancient genome compared to the modern strains. Using this method, we recovered 1187 unique contigs (total length 1,435,761 bp) corresponding to 972 annotated genes and an average unique coverage depth of 57.0x (Figure 3—figure supplement 3).
Figure 3—figure supplement 3.
Genome coverage plots for pooled nodule shotgun libraries.
G. vaginalis (NC_014644), average coverage 57.0X. All reads were restricted to minimum length of 35 bp and minimum map quality 30 with all duplicates removed. Figures depict coverage of the genome in 100 bp blocks across references. Concentric grey circles demarcate increments of 50X coverage in both plots.
DOI:
http://dx.doi.org/10.7554/eLife.20983.023
Our sample of 35 isolates of G. vaginalis was grouped into four previously defined cln class="Disease">ades (Figure 3, Figure 3—figure supplement 4), which have been proposed to represent distinct species (Ahmed et al., 2012). G. vaginalisTroy sits within Clade 1, among vaginal and endometrial isolates collected from both healthy women and patients with bacterial vaginosis. Interestingly, the 800-year-old sample from Troy (Turkey) falls within contemporary genetic diversity (Supplementary file 1I).
Figure 3.
Phylogenetic analysis of Gardnerella vaginalis.
A maximum likelihood tree estimated using RAxML(Stamatakis, 2014) (Figure 3—source data 3) from a core alignment of G. vaginalis genomes (Figure 3—source data 1, Figure 3—source data 2). Branches are colored based on clades originally identified in Ahmed et al.(Ahmed et al., 2012) (green = clade 1, blue = clade 2, red = clade 3, purple = clade 4). Tips from modern G. vaginalis isolates are labeled based on sample source (H = healthy vagina, BV = bacterial vaginosis, HE = healthy endometrium, E = endometrium, U = unknown). Lighter colored branches have bootstrap values less than 100. Clinical phenotypes are interspersed throughout the phylogeny, and the Troy genome is not associated with a consistently pathogenic lineage of G. vaginalis. Inset: Recombinant fragments in G. vaginalis core genome identified by BratNextGen (Figure 3—source data 4) (Marttinen, 2012). Each circle represents one genome. Colored blocks represent recombinant fragments, and colors correspond to the clade designations in the phylogenetic tree. Plot made with Circos (Krzywinski et al., 2009).
DOI:
http://dx.doi.org/10.7554/eLife.20983.016
DOI:
http://dx.doi.org/10.7554/eLife.20983.017
DOI:
http://dx.doi.org/10.7554/eLife.20983.018
DOI:
http://dx.doi.org/10.7554/eLife.20983.019
DOI:
http://dx.doi.org/10.7554/eLife.20983.020
Damage profiles of non-UDG treated (‘nonU’) reads from a pooled NOD1_nonU and NOD2_nonU data set (total of 1,565,548 trimmed reads >24 bp) mapping to G. vaginalis strain ATCC 14019. Paired end reads were mapped using bwa (Li and Durbin, 2009) with default settings and duplicates were removed with samtools rmdup (Li et al., 2009). Damage profiles were generated using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.021
All nodule shotgun libraries (Nod1_1h-UDG, Nod1_1h-nonU, Nod2-UDG, Nod2-nonU) were pooled, reads were restricted to a minimum length of 35 bp and mapping quality of 30 and all duplicates removed both within and between libraries. The fragment length distribution of the remaining 1,658,978 reads was visualized using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.022
G. vaginalis (NC_014644), average coverage 57.0X. All reads were restricted to minimum length of 35 bp and minimum map quality 30 with all duplicates removed. Figures depict coverage of the genome in 100 bp blocks across references. Concentric grey circles demarcate increments of 50X coverage in both plots.
DOI:
http://dx.doi.org/10.7554/eLife.20983.023
The network created in SplitsTree v 4 (Huson and Bryant, 2006) of Gardnerella vaginalis. The networks recapitulate the structure of maximum likelihood tree (Figure 3).
DOI:
http://dx.doi.org/10.7554/eLife.20983.024
Figure 3—figure supplement 4.
Neighbor net network of core genomes.
The network created in SplitsTree v 4 (Huson and Bryant, 2006) of Gardnerella vaginalis. The networks recapitulate the structure of maximum likelihood tree (Figure 3).
DOI:
http://dx.doi.org/10.7554/eLife.20983.024
Phylogenetic analysis of Gardnerella vaginalis.
A maximum likelihood tree estimated using RAxML(Stamatakis, 2014) (Figure 3—source data 3) from a core alignment of n class="Species">G. vaginalis genomes (Figure 3—source data 1, Figure 3—source data 2). Branches are colored based on clades originally identified in Ahmed et al.(Ahmed et al., 2012) (green = clade 1, blue = clade 2, red = clade 3, purple = clade 4). Tips from modern G. vaginalis isolates are labeled based on sample source (H = healthy vagina, BV = bacterial vaginosis, HE = healthy endometrium, E = endometrium, U = unknown). Lighter colored branches have bootstrap values less than 100. Clinical phenotypes are interspersed throughout the phylogeny, and the Troy genome is not associated with a consistently pathogenic lineage of G. vaginalis. Inset: Recombinant fragments in G. vaginalis core genome identified by BratNextGen (Figure 3—source data 4) (Marttinen, 2012). Each circle represents one genome. Colored blocks represent recombinant fragments, and colors correspond to the clade designations in the phylogenetic tree. Plot made with Circos (Krzywinski et al., 2009).
DOI:
http://dx.doi.org/10.7554/eLife.20983.016
Concatenated alignment of core genes in G. vaginalis.
DOI:
http://dx.doi.org/10.7554/eLife.20983.017
G. vaginalis core genome alignment trimmed with Gblocks.
DOI:
http://dx.doi.org/10.7554/eLife.20983.018
Maximum likelihood phylogenetic analysis of trimmed G. vaginalis alignment with RAxML.
DOI:
http://dx.doi.org/10.7554/eLife.20983.019
Recombinant fragments detected with BratNextGen in trimmed G. vaginalis alignment.
DOI:
http://dx.doi.org/10.7554/eLife.20983.020
Ancient DNA damage assessment of G. vaginalis.
Damage profiles of non-UDG treated (‘nonU’) ren class="Disease">ads from a pooled NOD1_nonU and NOD2_nonU data set (total of 1,565,548 trimmed reads >24 bp) mapping to G. vaginalis strain ATCC 14019. Paired end reads were mapped using bwa (Li and Durbin, 2009) with default settings and duplicates were removed with samtools rmdup (Li et al., 2009). Damage profiles were generated using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.021
Fragment length distribution (FLD) for G. vaginalis ATCC 14019.
All nodule shotgun libraries (Nod1_1h-n class="Gene">UDG, Nod1_1h-nonU, Nod2-UDG, Nod2-nonU) were pooled, reads were restricted to a minimum length of 35 bp and mapping quality of 30 and all duplicates removed both within and between libraries. The fragment length distribution of the remaining 1,658,978 reads was visualized using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.022
Genome coverage plots for pooled nodule shotgun libraries.
G. vaginalis (n class="Chemical">NC_014644), average coverage 57.0X. All reads were restricted to minimum length of 35 bp and minimum map quality 30 with all duplicates removed. Figures depict coverage of the genome in 100 bp blocks across references. Concentric grey circles demarcate increments of 50X coverage in both plots.
DOI:
http://dx.doi.org/10.7554/eLife.20983.023
Neighbor net network of core genomes.
The network created in SplitsTree v 4 (Huson and Bryant, 2006) of Gardnerella vaginalis. The networks recapitulate the structure of maximum likelihood tree (Figure 3).DOI:
http://dx.doi.org/10.7554/eLife.20983.024Consistent with prior reports (Ahmed et al., 2012), we identified extensive impacts of lateral gene transfer (LGT) on G. vaginalis diversity (Figure 3). Even in the core genome alignn class="Species">ment, which contains just 44% of per-isolate gene content, we estimate that 20% of sites have been affected by recombination. This high rate of recombination may help to explain the remarkable preservation of genetic diversity in G. vaginalis. A recent study of Helicobacter pylori, which has similarly high rates of LGT, found that genetic diversity within the species has been preserved for more than five thousand years (Maixner et al., 2016).
We discovered two distinct clades of n class="Species">S. saprophyticus (Figure 4, Figure 4—figure supplement 4), one of which (Clade P) appears to be more strongly associated with pathogenicity and includes our ancient S. saprophyticusTroy. Nineteen of twenty veterinary and humanclinical isolates belong to Clade P, an association that was statistically significant (Appendix). A second clade (Clade E) is made up of food and environmental isolates of S. saprophyticus, as well as a human UTI isolate from Japan.
Figure 4.
Phylogenetic analysis of Staphylococcus saprophyticus.
(A) Maximum likelihood tree estimated using RAxML (Stamatakis, 2014) (Figure 4—source data 3) from an alignment of S. saprophyticus genomes (Figure 4—source data 1, Figure 4—source data 2). Bootstrap values less than 100 are labeled. Silhouettes indicate bacterial sample source. Isolates without silhouettes are from human clinical samples isolated from urine. Color corresponds to country of isolation as seen on the map. Full sample descriptions are in Supplementary file 1H. (B) Source countries of bacterial samples. (C) Neighbor-net network of S. saprophyticus plasmid sequences (Figure 4—source data 4) related to pSST1 created in SplitsTree4 (Huson and Bryant, 2006). The boxed inset is an enlarged version of the portion of the network from Clade P isolates. Some S. saprophyticus isolates do not encode pSST1-like plasmids, and therefore, they are not included in the network. Starts and stops of recombinant regions of the alignment can be found in Figure 4—source data 5.
DOI:
http://dx.doi.org/10.7554/eLife.20983.025
DOI:
http://dx.doi.org/10.7554/eLife.20983.026
DOI:
http://dx.doi.org/10.7554/eLife.20983.027
DOI:
http://dx.doi.org/10.7554/eLife.20983.028
DOI:
http://dx.doi.org/10.7554/eLife.20983.029
DOI:
http://dx.doi.org/10.7554/eLife.20983.030
Damage profiles of non-UDG treated (‘nonU’) reads from a pooled NOD1_nonU and NOD2_nonU data set (total of 1,565,548 trimmed reads >24 bp) mapping to S. saprophyticus strain ATCC 15305. Paired end reads were mapped using bwa (Li and Durbin, 2009) with default settings and duplicates were removed with samtools rmdup (Li et al., 2009). Damage profiles were generated using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.031
All nodule shotgun libraries (Nod1_1h-UDG, Nod1_1h-nonU, Nod2-UDG, Nod2-nonU) were pooled, reads were restricted to a minimum length of 35 bp and mapping quality of 30 and all duplicates removed both within and between libraries. The fragment length distribution of the remaining 3,904,552 reads was visualized using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.032
S. saprophyticus (NC_007350), average coverage 298.6X. All reads were restricted to minimum length of 35 bp and minimum map quality 30 with all duplicates removed. Figures depict coverage of the genome in 100 bp blocks across references. Concentric grey circles demarcate increments of 50X coverage in both plots.
DOI:
http://dx.doi.org/10.7554/eLife.20983.033
Networks created in SplitsTree v 4 (Huson and Bryant, 2006) of S. saprophyticus. The networks recapitulate the structure of maximum likelihood trees (Figure 4).
DOI:
http://dx.doi.org/10.7554/eLife.20983.034
Novobiocin resistance is conferred by a glycine at position 85 and lysine at position 140 (Vickers et al., 2007), which is present in all S. saprophyticus genomes examined here. SSP1924 and fosB confer streptomycin and fosfomycin resistance, respectively, and are encoded in vSs15305 in the ATCC 15305 reference genome (Kuroda et al., 2005). While none of the other isolates encode the entire genomic island, fosB and SSP1924 are found in isolates from both Clade P and Clade E. The canine isolate (K) harbors SCCmec containing mecA conferring methicillin resistance that has been identified in human clinical isolates of S. saprophyticus (Higashide et al., 2008).
DOI:
http://dx.doi.org/10.7554/eLife.20983.035
Each circle in the figure represents one isolate. Regions with significant evidence for recombination are shown as black or colored blocks. Black ticks mark intervals of 20 kb, and positions are in reference to ATCC15305. 17.9% of the alignment is recombinant in at least one strain. After removing fragments associated with known MGEs, 15.0% of sites are recombinant in the core genome. Isolates are colored according to clade (purple- Clade E, green- bovine, blue- Troy, black- Clade P).
DOI:
http://dx.doi.org/10.7554/eLife.20983.036
Figure 4—figure supplement 4.
Neighbor net network of core genomes.
Networks created in SplitsTree v 4 (Huson and Bryant, 2006) of S. saprophyticus. The networks recapitulate the structure of maximum likelihood trees (Figure 4).
DOI:
http://dx.doi.org/10.7554/eLife.20983.034
Phylogenetic analysis of Staphylococcus saprophyticus.
(A) Maximum likelihood tree estimated using RAxML (Stamatakis, 2014) (Figure 4—source data 3) from an alignment of n class="Species">S. saprophyticus genomes (Figure 4—source data 1, Figure 4—source data 2). Bootstrap values less than 100 are labeled. Silhouettes indicate bacterial sample source. Isolates without silhouettes are from humanclinical samples isolated from urine. Color corresponds to country of isolation as seen on the map. Full sample descriptions are in Supplementary file 1H. (B) Source countries of bacterial samples. (C) Neighbor-net network of S. saprophyticus plasmid sequences (Figure 4—source data 4) related to pSST1 created in SplitsTree4 (Huson and Bryant, 2006). The boxed inset is an enlarged version of the portion of the network from Clade P isolates. Some S. saprophyticus isolates do not encode pSST1-like plasmids, and therefore, they are not included in the network. Starts and stops of recombinant regions of the alignment can be found in Figure 4—source data 5.
DOI:
http://dx.doi.org/10.7554/eLife.20983.025
S. saprophyticus whole genome alignment.
DOI:
http://dx.doi.org/10.7554/eLife.20983.026
S. saprophyticus whole genome alignment trimmed with trimal.
DOI:
http://dx.doi.org/10.7554/eLife.20983.027
Maximum likelihood phylogenetic analysis of trimmed S. saprophyticus alignment with RAxML.
DOI:
http://dx.doi.org/10.7554/eLife.20983.028
S. saprophyticus plasmid alignment trimmed with trimal.
DOI:
http://dx.doi.org/10.7554/eLife.20983.029
Recombinant fragments detected with BratNextGen in S. saprophyticus alignment.
DOI:
http://dx.doi.org/10.7554/eLife.20983.030
Ancient DNA damage assessment of S. saprophyticus.
Damage profiles of non-UDG treated (‘nonU’) ren class="Disease">ads from a pooled NOD1_nonU and NOD2_nonU data set (total of 1,565,548 trimmed reads >24 bp) mapping to S. saprophyticus strain ATCC 15305. Paired end reads were mapped using bwa (Li and Durbin, 2009) with default settings and duplicates were removed with samtools rmdup (Li et al., 2009). Damage profiles were generated using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.031
Fragment length distribution (FLD) for S. saprophyticus ATCC 15305.
All nodule shotgun libraries (Nod1_1h-n class="Gene">UDG, Nod1_1h-nonU, Nod2-UDG, Nod2-nonU) were pooled, reads were restricted to a minimum length of 35 bp and mapping quality of 30 and all duplicates removed both within and between libraries. The fragment length distribution of the remaining 3,904,552 reads was visualized using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.032S. saprophyticus (n class="Chemical">NC_007350), average coverage 298.6X. All reads were restricted to minimum length of 35 bp and minimum map quality 30 with all duplicates removed. Figures depict coverage of the genome in 100 bp blocks across references. Concentric grey circles demarcate increments of 50X coverage in both plots.
DOI:
http://dx.doi.org/10.7554/eLife.20983.033Networks created in SplitsTree v 4 (Huson and Bryant, 2006) of S. saprophyticus. The networks recapitulate the structure of maximum likelihood trees (Figure 4).DOI:
http://dx.doi.org/10.7554/eLife.20983.034
Presence of mobile genetic elements, virulence genes, and antibiotic resistance in S. saprophyticus.
Novobiocin resistance is conferred by a n class="Chemical">glycine at position 85 and lysine at position 140 (Vickers et al., 2007), which is present in all S. saprophyticus genomes examined here. SSP1924 and fosB confer streptomycin and fosfomycin resistance, respectively, and are encoded in vSs15305 in the ATCC 15305 reference genome (Kuroda et al., 2005). While none of the other isolates encode the entire genomic island, fosB and SSP1924 are found in isolates from both Clade P and Clade E. The canine isolate (K) harbors SCCmec containing mecA conferring methicillin resistance that has been identified in humanclinical isolates of S. saprophyticus (Higashide et al., 2008).
DOI:
http://dx.doi.org/10.7554/eLife.20983.035
Recombinant regions detected by BratNextGen in S. saprophyticus.
Each circle in the figure represents one isolate. Regions with significant evidence for recombination are shown as black or colored blocks. Black ticks mark intervals of 20 kb, and positions are in reference to ATCC15305. 17.9% of the alignn class="Species">ment is recombinant in at least one strain. After removing fragments associated with known MGEs, 15.0% of sites are recombinant in the core genome. Isolates are colored according to clade (purple- Clade E, green- bovine, blue- Troy, black- Clade P).
DOI:
http://dx.doi.org/10.7554/eLife.20983.036Plasmids similar to S. saprophyticusn class="Gene">Troy pSST1 were present in isolates from both clades. The relationships among plasmid sequences from S. saprophyticusTroy and other members of Clade P were distinct from those of the core genome; we also found evidence of recombination among pSST1-like plasmids (Figure 4C, Appendix).
A long branch separates Clade P pSST1 from those of Cln class="Disease">ade E (Figure 4C), recapitulating their relationship on the core genome phylogeny. This was also true of pSSP2, the only other plasmid we identified in both Clades P and E (but not S. saprophyticusTroy; Figure 4—figure supplement 5). These observations suggest plasmids are more readily exchanged within Clades P and E than between them. This could indicate that Clades P and E are spatially segregated, that there are mechanistic barriers to plasmid exchange between clades, or that epistatic interactions reinforce clade separation of these mobile elements.
Figure 4—figure supplement 5.
Presence of mobile genetic elements, virulence genes, and antibiotic resistance in S. saprophyticus.
Novobiocin resistance is conferred by a glycine at position 85 and lysine at position 140 (Vickers et al., 2007), which is present in all S. saprophyticus genomes examined here. SSP1924 and fosB confer streptomycin and fosfomycin resistance, respectively, and are encoded in vSs15305 in the ATCC 15305 reference genome (Kuroda et al., 2005). While none of the other isolates encode the entire genomic island, fosB and SSP1924 are found in isolates from both Clade P and Clade E. The canine isolate (K) harbors SCCmec containing mecA conferring methicillin resistance that has been identified in human clinical isolates of S. saprophyticus (Higashide et al., 2008).
DOI:
http://dx.doi.org/10.7554/eLife.20983.035
The humann class="Species">clinical isolates in Clades P and E are nested within the phylogeny with more divergent lineages associated with other animals. This suggests that the most recent common ancestor (MRCA) of S. saprophyticus may not have been human-associated. This is in stark contrast to the major pathogen in the genus, Staphylococcus aureus, where phylogenetic studies suggest that the MRCA of human and livestock-associated lineages had a human host (Fitzgerald, 2012; Weinert et al., 2012; Shepheard et al., 2013). S. aureus is strongly associated with its niche on the human body and is transmitted primarily from person-to-person. S. saprophyticus, by contrast, appears to be a generalist that colonizes a range of environments.
Several lines of evidence also indicate that humans acquire n class="Species">S. saprophyticus infection from the environment. In northern climates, there is marked seasonal variation in the incidence of S. saprophyticus UTI (Rupp et al., 1992; Hovelius and Mårdh, 1984; Ringertz and Torssander, 1986; Hedman et al., 1993; Widerström et al., 2007), whereas there is no evidence of seasonality in Mediterranean climates (Schneider and Riley, 1996). S. saprophyticus can be identified in environmental samples, with a strong seasonal peak that occurs just before peak rates of S. saprophyticus UTI in northern climates (Hedman et al., 1993; Soge et al., 2009). Molecular epidemiological surveys also suggest S. saprophyticus is primarily acquired from an environmental reservoir, rather than as a result of person-to-person transmission (Widerström et al., 2007; Widerström et al., 2012). These observations suggest that the bacteria cycle between host-associated and environmental stages, with seasonal climatic effects on their abundance in the environment.
The length of the branch leading to n class="Species">S. saprophyticus Troy is similar to those leading to the other tips (Figure 4A), suggesting there is little temporal signal in the phylogeny. Calibrated phylogenetic analyses (Appendix) confirmed the absence of temporal signal, which precludes reliable estimation of the rate of substitution or divergence times for S. saprophyticus.
A mixed environmental, comn class="Species">mensal and pathogenic niche may in part explain the absence of temporal structure in our sample of S. saprophyticus. Selection pressures and generation times are likely to differ between free-living and host-associated stages, which can obscure temporal signals in genetic data (Bromham, 2009). In addition to producing rate variability, periods of dormancy in the environment – e.g. during the winter in northern climates, as suggested by seasonal patterns in cultivability – would be predicted to lower the overall rate at which S. saprophyticus evolves (Bromham, 2009; Weinert et al., 2015). The 800 year interval between S. saprophyticusTroy and the other tips may simply be too short relative to the overall depth of the tree to allow reliable rate inference.
Notably, all n class="Species">human-associated isolates of S. saprophyticus in Clade P form a monophyletic group, to which the bovinemastitis strain falls basally; there are no modern human pathogenic representatives of the S. saprophyticusTroy lineage. This may mean that the ecology of S. saprophyticus differed in the Byzantine world, with humaninfections arising from a different reservoir of bacteria than they do today. S. saprophyticus is readily cultured from the environment around livestock (Hedman et al., 1993; Cherif-Antar et al., 2016), and Byzantine era peasants in Anatolia typically shared their households with cattle (Lefort, 2007). This and other historical settings are likely to have facilitated spillover events and, perhaps, the circulation of bacteria that were adapted to both livestock and humans.
Based on the available data, it is not possible to determine whether the humann class="Species">clinical isolate nested among environmental and food-associated bacteria in Clade E represents a spillover or a second emergence into humans. In either event, it appears that S. saprophyticus can transition to a human pathogenic niche with relative ease. We did not identify any gene content uniquely shared (or absent) among the pathogenic strains in our sample, which suggests that pleiotropy underlies S. saprophyticus’ flexible association with diverse niches. For many bacterial genera, genetic distances between free-living organisms and pathogens are larger than observed here, and pathogen emergence is a singular event characterized by genomic decay and loss of functions required outside the pathogenic niche (Parkhill et al., 2001; Larsson et al., 2009; Reuter et al., 2014). More studies and wider sampling are needed to fully characterize the niche of S. saprophyticus, but our observations reinforce the notion that the adaptive paths to bacterial virulence are more diverse than has previously been appreciated.
Complications of pregnancy and childbirth are major cn class="Chemical">auses of morbidity and mortality worldwide and new threats to maternal health continue to emerge (WHO et al., 2014; Mlakar et al., 2016). Our analyses of the remains of a woman who died in Late Byzantine Troy connect her to this broad historical and epidemiological phenomenon. Her infection was associated with exuberant calcification of the placenta, which replicated maternal, fetal, and bacterial cells in calcium phosphate minerals and preserved a high resolution molecular portrait of their contents. S. saprophyticus, an organism the decedent is likely to have acquired from her environment, and G. vaginalis, a member of the native humanbiota, are the dominant bacterial species of the infection. S. saprophyticusTroy belongs to a lineage that appears to be uncommonly associated with human disease in the modern world, whereas G. vaginalisTroy nests among modern commensal and pathogenic strains on its phylogeny. This highlights the complexity of virulence as a bacterial trait and a potential role of interactions among bacterial species in shaping pathologic outcomes of infection.
Materials and methods
Samples
Ethics approval for the study of the remains of the individual excavated in 2005 from grave 14 (Troy project, University of Tübingen, bone-sample x24.177) in qun class="Disease">adrat x24 at Troy was obtained from Hamilton Health Sciences and McMaster University (REB# 13–146 T). Samples of extant bacteria were provided to investigators without patient identifiers or protected health information; the members of the study team did not have access to any identifiers or protected health information associated with the bacterial isolates. Sediment from the Troy site was imported to and studied at McMaster University in accordance with Canadian Food Inspection Agency guidelines, under permit number P-2012–04220.
Radiocarbon dating
Subsamples of both nodules and the ulna bone were sent to the Keck Carbon Cycle AMS Facility (Earth System Science Departn class="Species">ment, UC Irvine, CA) for radiocarbon dating (Appendix, Supplementary file 1A). In addition to 14C dating ultrafiltered collagen from the ulna, we also attempted to measure 14C in carbonate from nodule one (with 10–30% leaching) and three organic fractions from nodule two: raw nodule (including carbonate), residue from demineralization with room temperature 1NHCl, and residue from demineralization plus gelatinization with 60°C 0.01NHCl.
X-ray diffraction (XRD)
A subsample of nodule two was subjected to mineralogical analysis using XRD at the Brockhouse Institute for Materials Research (McMaster University) using the Bruker D8 DISCOVER with DAVINCI.DESIGn class="Chemical">N diffractometer (Figure 2—figure supplement 1). Sample flakes were piled on top of a single crystal silicon wafer, and aligned to the center of the diffractometer using the laser-video alignment system. The detector to sample distance was calibrated with corundum to 20 cm. Four frames were collected to obtain the 2θ range of 8–103°. The frames were integrated into intensity plots using DIFFRAC.EVA V.3.0 (software package from Bruker AXS). A pattern search/match was executed using the integrated ICDD PDF-2 2011 powder database. Slight mismatch in the peak positions are likely due to variation of elemental stoichiometry in the identified phase.
Scanning electron microscopy (SEM) with energy dispersive X-ray spectroscopy (EDS)
A subsample of nodule two was viewed via SEM and subjected to elemental analysis using SEM-EDS at the MAX Diffraction Facility (McMaster University). Sample pieces were attached to ann class="Chemical">aluminum stub with double-sided carbon tape and sputter-coated with a thin layer of Au. The sample was viewed in a Tescan Vega II LSU operating at 20kV. Energy dispersive spectroscopy (EDS) was carried out with an IMAX detector (Oxford Instruments) and INCA software (Supplementary file 1B).
Ancient DNA extractions
We made multiple Dn class="Chemical">NA extractions from subsamples of two nodules, an ulna, sediment taken from the site and relevant associated blanks/controls. The details of these can all be found in Supplementary file 1J. As the elemental analyses of the nodules suggested a highly mineralized constituent, we extracted them in a similar fashion to bone and they are labelled as such in Supplementary file 1J hyphenated (‘bone’). ‘Bone’ (nodules) DNA extractions, consisted of demineralization (DM), removal and freezing of DM supernatant, incubation of non-demineralized tissue with a custom digestion buffer (DB), removal and freezing of DB supernatant, organic extraction of one/both supernatants, and concentration via filtration. They were performed as follows. For Set A (Supplementary file 1J), multiple consecutive rounds of DM (with 0.5M EDTA) and digestion were performed on a shaker (1000 rpm), with the supernatant(s) from each round subjected to organic extraction and filtration. DB consisted of 20 mM Tris pH 8.0, 0.5% sarcosyl, 250 μg/ml Proteinase K, 5 mM CaCl2, 50 mM DTT, 1% PVP, and 2.5 mM PTB. The breakdown of DM/DB rounds is as follows: round 1 = 1 st 0.5 mL overnight (ON) DM at room temperature (RT), second 0.5 mL DM at RT for 24 hr, and 7 hr digestion at 55°C, rounds 2–5 = ON 1 mL DM + 7 hr digestion at 55°C, and round 6 = ON 1 mL DM only. For subsequent extraction, all round one supernatants were combined and all round 2–6 supernatants were treated separately: supernatants were subjected to organic extraction using half-volume of phenol-chloroform-isopropanol (centrifuged at 16,000 x g for 5m), the aqueous phase of which was extracted with 750 μl chloroform (centrifuged as before). The final aqueous phase was filtered using Amicon Ultra 0.5 mL 10 kDa columns (EMD Millipore Corp., Billerica, MA, USA): columns primed with 450 μl 0.1xTE, followed by sample filtration, washed twice with 450 μl 0.1xTE, and eluted in 50 μl 0.1xTE. For Set B, nodule two, 1 mL DM was performed ON rotating at RT and the supernatant was frozen. Digestion was performed for 7 hr rotating at 55°C using 0.5 mL of DB (same recipe as Set A) and the supernatant was frozen. These supernatants were combined and subjected to organic extraction and filtration as in Set A. Set D, which was the ulna, followed the same protocol as Set B, except they were subjected to an additional initial 30 min demineralization with 500 μl 0.5M EDTA that was not extracted, and the DB did not contain DTT, PVP, or PTB and was performed ON.
Set C, ‘sediment’, Dn class="Chemical">NA extractions were performed using the Mo Bio PowerSoil DNA Isolation Kit (MO BIO Laboratories, Inc., Carlsbad, CA) following the manufacturer’s protocol, with a final elution of 100 μl in 0.1xTE. For each set of samples associated blanks were treated in an identical fashion. Please refer to Supplementary file 1J for details for each of these four sets of extractions.
Library preparation and indexing
Ancient DNA extracts were converted to double-indexed libraries for sequencing on the Illumina platform (list in Supplen class="Species">mentary file 1A). Prior to library preparation, all ulna DNA extracts (E5-1, 2, 3, 4, 7, 8, 9, and 10) were pooled to homogenize library input into multiple UDG and non-UDG libraries, as were the two set D extraction blanks (E5-6 and 12). Libraries were prepared as in (Wagner et al., 2014) using either regular (‘non-UDG’) or deaminated cytosine removal (damage repair; ‘UDG’) protocols, and subsequently amplified using a double-indexing protocol (Kircher et al., 2012; Meyer and Kircher, 2010). Each library set included at least one blank no-template (water) control reaction. Extract input volumes into library preparation were 10 μl (L25-L13), 20 μl (L01-L20), or 25 μl (L28-L38, 1 a-1j, and 1a-blk to 1j-blk). In non-UDG libraries L25-L38, the blunt-ending step was modified with an extended (3 hr) T4 PNK incubation prior to adding the T4 polymerase, in the same manner as the UDG protocol. Double-indexing amplification was performed as in (Wagner et al., 2014) for 10 cycles each, with 20 μl non-diluted library template input (except L25 and L13 which were used at 0.1x dilution) and included at least one no-template negative control reaction. All reactions were purified to 15 ul EB with the MinElute PCR Purification Kit (Qiagen, Hilden, Germany).
Enrichment
Two rounds of human mitochondrion targeted enrichn class="Species">ment were performed on the non-UDG treated ulna specimen for comparison to the nodule shotgun reads. Prior to enrichment, the 7 Ulna-non-D libraries (L31-L38) were pooled to homogenize input, and 9 ul of this pool was used as input into four enrichment reactions (Ulna-D E07-E10) alongside the extraction blank (E11) and a negative control reaction (enrichment blank). The enrichment reactions were performed as for the human mtDNA enrichments in (Wagner et al., 2014), using the same parameters and custom MYbaits baitset (MYcroarray, Ann Arbor, MI) designed from the rCRS sequence (http://www.ncbi.nlm.nih.gov/nucleotide/113200490 NCBI GenBank accession no. J01415.2), but with the following modifications according to updated manufacturer recommendations: post-hybridization bead-library binding was performed rotating at high temp (55°C), Wash Buffer 1 was eliminated, and the post-washed beads were suspended in 20 μl EB and used directly in the post-enrichment amplification. For the adapter-specific blocking oligos, 2 μM of four P5/P7 adapter sequence custom blocking oligos (corresponding to one strand of each molecule) were used for enrichment round 1, and the manufacturer-supplied Block #3 was used for enrichment round 2.
Amplification after each enrichment round was performed as in (Wagner et al., 2014), with n class="Disease">additional re-amplifications as required due to low output molarities (all amplification reactions were purified over MinElute columns to 15 μl EB). Post-round 1 amplification used 15 μl bead mixture directly as input into each 40 μl reaction (15 cycles). 6.5 μl of this purified reaction was used as input into enrichment round 2 (E17-E21), and to increase molarity prior to sequencing, 6.5 μl was used as a template for subsequent re-amplification reactions (12 cycles). Post-round 2 amplification used 10 μl bead mixture as input into two 40 μl reactions per enrichment (15 cycles), and the supernatants from each amplification (two per sample) were purified together. Prior to sequencing, 14 μl of this purified reaction was used as the template for a subsequent re-amplification reaction (16 cycles).
Sequencing and read preparation
All relevant ancient samples (nodules, bone and sediment) and their associated extraction blanks were sequenced. Details on the final data set names, their associated samples/libraries/enrichn class="Species">ments, raw reads passing filter, and pre-sequencing pooling schemes can be found in Supplementary file 1K. Prior to sequencing, all additional indexed libraries (shotgun and enrichment) were quantified via a qPCR assay targeting indexed molecules and pooled according to desired sequencing ratio. All pools except Nod1_all (pool ‘K’) were size-selected using an electrophoresis gel size selection procedure (retaining molecules ~125/150/150 to 500 bp in length) in order to exclude as much no-insert adaptimer (and other short adapter artifacts) as possible. Pool ‘F’ also contained 10 additional samples not considered in this paper (pooled at a ratio of ‘1’). Pools were sequenced across three paired-end runs on the HiSeq 1500, all alongside other, unrelated samples: Pool ‘K’ (80 bp final read length), Pool ‘F’ (85 bp final read length), and pools ‘G'-'J’ (90 bp final read length). On the last run, the enrichment round 1 and 2 samples (pools ‘H’ and ‘J’) were separated on two different lanes, since they have the same index sequences.
Metagenome/Microbiome analyses
For the metagenomic analyses and the ancient pathogen genome reconstructions, all data sets were trimmed of library adapter using cutn class="Disease">adapt (Martin, 2011) with settings -e 0.16, -O 1, -a AGATCGGAAGAGC (70) and reads <24 bp were removed retaining read order. To obtain metagenomic profiles from our shotgun data sets we used LMAT version 1.2.3 (Ames et al., 2013) to properly identify shotgun reads from the nodules (Nod1_1h-UDG, Nod2-UDG), sediment (Sediment-UDG), ulna (Ulna-UDG), and all metagenomic data sets available from the Human Microbiome Project (HMP, RRID:SCR_012956) database, housed at Lawrence Livermore National Laboratory (June 2015). Reads that could be identified at the sequence level or consistently at the species/strain level from all blank extracts were removed from final files used in the PCA analysis.
The PCAs were performed using prcomp (RRID:SCR_014676) in (R Core Team, 2015). The taxa identifications from the HMP were combined according to the origin of the sample. The number of samples combined into each category is indicated in the legend to Figure 1—figure supplement 8. A small number (1×10−7) was n class="Disease">added to those entries with zero reads assigned and then natural logs of the numbers were taken. The PCA was centered and scaled.
Reads were initially processed as described in the previous section. The n class="Species">S. saprophyticus Troy draft genome was reconstructed using iterative assembly to span gaps between contigs that were created from assembly to the S. saprophyticus reference genome (NC_007350).
Trimmed reads from n class="Chemical">Nod1_all-UDG were paired-end assembled to the S. saprophyticus reference (NC_007350) using BWA (RRID:SCR_010910) with default settings (Li and Durbin, 2009), and duplicates were removed using samtools (RRID:SCR_002105) rmdup (Li et al., 2009). The resulting assembly was imported into Geneious (v.6.1.6, Biomatters, Ltd, RRID:SCR_010519) and a strict (50%) consensus sequence was generated. From this consensus, 65 large contigs (880 bp – 170,993 bp) that corresponded to regions of average coverage (and that did not span rRNA/tRNA regions) were manually extracted, which represented the non-gap regions of the assembly. As gap regions could represent indel regions (e.g., lateral gene transfer events), rearrangements, or divergence, these contigs were subjected to an iterative assembly process using a set of unmapped reads in order to attempt to span gaps and connect the contigs. The primary set of reads used for iterative assembly was generated by trimming 100 bp from each end of the contigs, assembling all original reads using the above settings to this set of trimmed contigs, and removing these assembled reads from consideration. A subset of the unmapped reads (20–100% as required, depending on assembly success) along with the full set of contigs were then subject to iterative assembly using Geneious (v.6.1.6), seeded with the first or last 50 bp of a contig (settings: maximum gaps per read 10%, maximum gap size 2, word length 20, index word length 14, ignore words repeated >8x, maximum mismatches per read 1%, maximum ambiguity 4, map multiple best matches randomly). All non-rRNA-adjacent gaps were successfully spanned, except for the region that was discovered to belong on the plasmid rather than the chromosome.
Gardnerella vaginalis Troy gene reconstruction
Ancient gene sequences were reconstructed de novo, via assembly of a pool of reads that mapped to annotated n class="Species">G. vaginalis. First, CDS annotations were extracted from 34 modern G. vaginalis assemblies (Supplementary file 1I; except for strains 41V and 101) and concatenated into one reference/genome (100 N’s between each CDS). Trimmed paired end reads from Nod1_1h-UDG reads were mapped to the concatenated reference. All paired and unpaired reads that mapped were extracted and subjected to de novo assembly using Velvet 1.2.1 (Zerbino et al., 2008) with settings kmer 23, insert length 51, expected coverage 75, minimum contig length 24, and coverage cutoff auto (parameters for expected coverage were chosen based on previous observation of assembly to strain ATCC 14019). This generated 1207 contigs that were confirmed using blastn (default settings, RRID:SCR_004870) to the nr database (April 2014) to detect any non-G. vaginalis sequences or chimeras generated from low level bacterial species also present in the nodules. Twenty contigs were excluded due to the top hit being S. saprophyticus, leaving a final set of 1187 G. vaginalis contigs (total length 1,435,761 bp). The final set of genomic contigs was annotated using Prokka 1.7 (Seemann, 2014) (RRID:SCR_014732) producing 972 genes. Paired-end assembly of Nod1_1h-UDG reads to the final set of contigs with bowtie2 (Langmead and Salzberg, 2012) and removal of duplicates with samtools rmdup (Li et al., 2009) consists of 2,034,514 readpairs. Total reads mapping to G. vaginalis from paired end-assemblies are listed in Supplementary file 1L.
Bacterial genome coverage and DNA damage estimations
To most conservatively assess the abundance, coverage, and authenticity of our ancient ren class="Disease">ads, we ran a subset of analyses using slightly more stringent criteria. CASAVA (RRID:SCR_001802) processed reads were trimmed and merged with leeHom (Renaud et al., 2014) (RRID:SCR_002710) using its ancient DNA parameter (--ancientdna). We restricted reads from the UDG-treated shotgun nodule libraries (Nod1_1h-UDG and Nod2-UDG) to those of minimum 35 bp length and blasted all reads against the GenBank nucleotide database retaining only the top hit. For all libraries, we additionally mapped to the S. saprophyticus ATCC 15305 (NC_007350) and G. vaginalis ATCC 14019 (NC_014644) with a customized version of the Burrows-Wheeler Aligner (Li and Durbin, 2009) obtained from https://github.com/mpieva/network-aware-bwa) with a maximum edit distance of 0.01 (-n 0.01), allowing for no more than two gaps (-o 2) and with seeding effectively disabled (-l 16500), retaining only those mapped reads which were merged or properly paired [https://github.com/grenaud/libbam/retrieveMapped_single_and_ProperlyPair.cpp]. Molecules that were less than 35 bp in length, had a mapping quality score of less than 30, or were marked as duplicates based on both 5’ and 3’ coordinates were removed [https://bitbucket.org/ustenzel/biohazard.git]. We then pooled all nodule reads DNA originating from the shotgun DNA libraries (Nod1_1h-UDG, Nod1_1h-nonU, Nod2-UDG, Nod2-nonU) and the S. saprophyticus and G. vaginalis mappings and further removed any duplicated molecules found between libraries (Supplementary file 1L). For both mapping assemblies, the average coverage at each reference position was calculated using the bedtools (Quinlan et al., 2010) (RRID:SCR_006646) genomecov function and then averaged over 100 bp blocks and visualized with Circos (Krzywinski, 2009) (RRID:SCR_011798, Figure 3—figure supplement 3, Figure 4—figure supplement 3). Fragment length distributions for all pooled libraries and damage plots for the non-UDG treated libraries (Nod1_1h-nonU and Nod2-nonU) were calculated through mapDamage2 (Jónsson et al., 2013) (Figure 3—figure supplement 1, Figure 4—figure supplement 1).
Figure 3—figure supplement 1.
Ancient DNA damage assessment of G. vaginalis.
Damage profiles of non-UDG treated (‘nonU’) reads from a pooled NOD1_nonU and NOD2_nonU data set (total of 1,565,548 trimmed reads >24 bp) mapping to G. vaginalis strain ATCC 14019. Paired end reads were mapped using bwa (Li and Durbin, 2009) with default settings and duplicates were removed with samtools rmdup (Li et al., 2009). Damage profiles were generated using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.021
Figure 4—figure supplement 1.
Ancient DNA damage assessment of S. saprophyticus.
Damage profiles of non-UDG treated (‘nonU’) reads from a pooled NOD1_nonU and NOD2_nonU data set (total of 1,565,548 trimmed reads >24 bp) mapping to S. saprophyticus strain ATCC 15305. Paired end reads were mapped using bwa (Li and Durbin, 2009) with default settings and duplicates were removed with samtools rmdup (Li et al., 2009). Damage profiles were generated using mapDamage2 (Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.031
Human mitochondrial genome analyses
CASAVA processed reads (see directly above), from enriched ulna extractions (Ulna_Enr1-nonU, Ulna_Enr2-nonU), shotgun ren class="Disease">ads from the UDG treated nodule extraction (Nod1_1h-UDG), and corresponding extraction blanks (EblkD_Enr1-nonU, EblkD_Enr2-nonU and EblkA-UDG) were processed as described above, but mapped to the rCRS mitochondrial genome (NC_012920) (Andrews et al., 1999). Consensus sequences were called and contamination was estimated using Schmutzi, which implements iterative probability models to infer the endogenous bases given read length and deamination patterns (Renaud et al., 2015). Contamination was estimated at 12% and 13%, respectively, for the round 1 and round 2 enriched ulna libraries. These estimates are consistent with estimates from other aDNA studies (Posth et al., 2016). Contamination could not be confidently assessed from the shotgun nodule library as it had been UDG treated, which obfuscates deamination patterns and thereby lessens the differentiation between endogenous and contaminant molecules. mtDNA consensus sequences were uploaded to the HaploGrep webserver [http://haplogrep.uibk.ac.at/] and haplogroups were determined in reference to Phylotree Build 16 (Kloss-Brandstätter et al., 2011; van Oven and Kayser, 2009) (RRID:SCR_012948) and found to be U3b and U3b3. Haplogroup U3b was assigned to the consensus sequence generated from the first round enrichment of the ulna because there was missing data for polymorphisms diagnostic to the haplogroup U3b3 (Supplementary file 1E). All three consensus sequences shared an additional five private polymorphisms not diagnostic to haplogroup U3b3. 137 sequences assigned to haplogroup U3 were collected from all human complete mtDNA genomes in GenBank (18 June 2015), and aligned along with the Troy consensus sequence generated from the nodule shotgun (Nod1_1h-UDG) library with MUSCLE v3.8 (Edgar, 2004) (RRID:SCR_011812). It was determined that the best model of nucleotide substitute for this group of 138 sequences was HKY+I+Γ using the program jModelTest2 (Darriba et al., 2012) and the built-in Akaike Information Criterion (Akaike, 1974). A Bayesian Maximum Clade Credibility tree was calculated using BEAST v1.8 (Drummond et al., 2012) (RRID:SCR_010228) and TreeAnnotator (Drummond et al., 2007) with the nucleotide data partitioned between coding and non-coding and a strict molecular clock with evolutionary rates of 1.708×10−8 and 9.88 3×10−8 nucleotide substitutions/site/year following Soares et al. (Soares et al., 2009) (Figure 1—figure supplement 6). Damage patterns and fragment length distribution of ancient DNA mapped to mitochondrial genome can be found in Figure 1—figure supplements 3–4.
Figure 1—figure supplement 3.
Fragment length distributions for non-UDG treated human mitochondrial assemblies.
These FLDs were generated from the Ulna enriched libraries (A + B) as well as the non enriched nodule (C) using mapDamage2 (Jonsson et al., 2013) from merged nonUDG data sets assembled to the human mitochondrial reference genome (Andrews et al., 1999), NCBI accession NC_012920.
DOI:
http://dx.doi.org/10.7554/eLife.20983.006
Figure 1—figure supplement 4.
Ancient DNA damage assessment of human mitochondrial reads.
Damage profiles of non-UDG treated (‘nonU’) as well as UDG treated merged reads assembled to the human mitochondrial rCRS reference genome (NC_012920) for (A) Ulna_Enr1-nonU round one human mitochondrial enrichment, (B) Ulna_Enr2-nonU round 2, and (C) a Nod1_1h-UDG reads (which have been UDG treated). Damage profiles were generated using mapDamage2.(Jonsson et al., 2013).
DOI:
http://dx.doi.org/10.7554/eLife.20983.007
Human nuclear genome analyses
Reads from four shotgun libraries (three from the nodules, n class="Chemical">Nod1_1 hr_UDG, Nod2-UDG, Nod2-nonU; one from the ulna, Ulna-UDG) were mapped and processed as described for the mitochondrial genome above. Additionally, the reads originating from the four nodule libraries were pooled together for comparison (‘Nodule Pooled’). We mapped the libraries to a hard-masked hg38 reference genome downloaded from the UCSC genome browser [http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/] and recorded the number of reads mapping to chrX, chrY, mitochondrial genome and all autosomes. We first filtered for mapped merged or properly paired reads with a minimum length of 35 bp and a minimum mapping quality filter of 30. Percent coverage was calculated by tallying the number of positions covered by at least one read and dividing by the total genome length with masked regions subtracted. We calculated the coverage depth by summing coverage of all positions and dividing the total by this same masked genome length (Supplementary file 1G).
Extraction and sequencing of modern S. saprophyticus
Fourteen new S. saprophyticus isolates (North America: eight human, one bovine, and one canine; Australia: two human; Japan: two human) were sequenced for this study to provide a broader comparative genomic data set (Supplementary file 1H).
Extraction and sequencing of North American S. saprophyticus
DNA was extracted at the University of Wisconsin-Mn class="Disease">adison. Isolates were inoculated in TSB and grown overnight at 37°C in a shaking incubator. Cultures were pelleted, resuspended in 140 μL TE, and incubated overnight with 50 units of mutanolysin. DNA was extracted using the MasterPure Gram Positive DNA Purification Kit (EpiCentre). Extracts were prepared for sequencing with the Illumina Nextera XT library preparation kit and sequenced in two different batches: samples 13–31 were pooled in equal ratio with 24 additional unrelated samples and sequenced on a MiSeq platform using a 2×250 kit; samples K, X, and M were sequenced at a desired ratio alongside pools ‘G'-'J’ (described above) as part of the HiSeq 1500 paired-end run (85 bp read length). Average insert sizes (estimated from Agilent Bioanalyzer analysis) are 849 bp (13–31, pooled average), 775 bp for M, 716 bp for X, and 769 bp for K.
Extraction and sequencing of Australian S. saprophyticus
DNA extractions were performed as described above. Dn class="Chemical">NA was submitted to the University of Wisconsin-Madison Biotechnology Center. DNA concentration was verified using the Qubit dsDNA HS Assay Kit (Life Technologies, Grand Island, NY). Samples were prepared according the TruSeq Nano DNA LT Library Prep Kit (Illumina Inc., San Diego, California, USA) with minor modifications. A maximum of 200 ng of each sample was sheared using a Covaris M220 Ultrasonicator (Covaris Inc, Woburn, MA, USA). Sheared samples were size selected for an average insert size of 550 bp using Spri bead based size exclusion. Quality and quantity of the finished libraries were assessed using an Agilent DNA High Sensitivity chip (Agilent Technologies, Santa Clara, CA) and Qubit dsDNA HS Assay Kit, respectively. Libraries were standardized to 2 μM. Paired-end, 150 bp sequencing was performed using v2 SBS chemistry on an Illumina MiSeq sequencer. Images were analyzed using the Illumina Pipeline, version 1.8.2.
Extraction and sequencing of Japanese S. saprophyticus
S. saprophyticus cells were lysed by achromopeptidase (WAKO, Kyoto, Japan), and the genomic Dn class="Chemical">NA was prepared with conventional phenol/chloroform extraction and ethanol precipitation, followed by further purification with QIAGEN genome DNA preparation kit. Library preparation was completed using the Nextera XT DNA Sample Prep Kit (Illumina, San Diego, CA, USA), followed by insert size selection using 1% TAE agarose electrophoresis to obtain an insert of approximately 400 bp. Sequencing was performed on NextSeq 500 (Illumina, San Diego, CA, USA) using the NextSeq 500 v1 kit (300 cycle) with paired-end 150 bp sequencing.
Reference-guided assembly and alignment of S. saprophyticus
Reads for the new modern samples were processed with reference-guided assembly via a pipeline [https://github.com/tracysmith/RGAPepPipe]. For reference guided assembly, ren class="Disease">ad quality was assessed and trimmed with TrimGalore! v 0.4.0 [www.bioinformatics.babraham.ac.uk/projects/trim_galore], a wrapper script for FastQC [www.bioinformatics.babraham.ac.uk/projects/fastqc, RRID:SCR_005539] and cutadapt (Martin, 2011) (RRID:SCR_011841). Reads were mapped to the ATCC 15305 reference genome using BWA-MEM v 0.7.12 (Li, 2013) (RRID:SCR_010910) and bam files sorted using Samtools v 1.2 (Li et al., 2009) (RRID:SCR_002105). Read group information was edited and duplicates removed using Picard v 1.138 [picard.sourceforge.net, RRID:SCR_006525]. Reads were locally realigned using GATK v 2.8.1 (DePristo et al., 2011) (RRID:SCR_001876). Variants were called using Pilon v 1.16 (Walker et al., 2014) (RRID:SCR_014731) with a minimum read depth of 10, minimum mapping quality of 40 and minimum base quality of 20. Whole genome alignment of the Troy strain and de novo assemblies to ATCC 15305 was performed using Mugsy 2.3 (Angiuoli et al., 2011) (RRID:SCR_001414).
De novo assembly and annotation of S. saprophyticus
The draft genome sequences of Japanese isolates were obtained by de novo assembly using CLC genome workbench v8.02 (RRID:SCR_011853). For North American and n class="Chemical">Australian S. saprophyticus genomes, de novo assembly was performed using the iMetAMOS pipeline (Koren et al., 2014; Treangen et al., 2013) (RRID:SCR_011914). We compared assemblies from SPAdes (Bankevich et al., 2012), MaSurCA (Zimin et al., 2013), and Velvet (Zerbino et al., 2008). KmerGenie (Chikhi and Medvedev, 2014) was used to select kmer sizes for assembly. iMetAMOS uses FastQC [www.bioinformatics.babraham.ac.uk/projects/fastqc] to check read data quality. Assemblies were evaluated using QUAST (Gurevich et al., 2013), REAPR (Hunt et al., 2013), LAP (Ghodsi et al., 2013), ALE (Clark et al., 2013), FreeBayes (Garrison and Marth, 2012), and CGAL (Rahman et al., 2013). Additionally, Kraken (Wood et al., 2014) was run to detect potential contamination in sequence data. The SPAdes assembly was identified as best for isolates 13, 16, 19, 41, 42, 43, K, M, X, and 129. The MaSurCA assembly was identified as best for isolates 14, 15, 18, and 31. Genomes were annotated using Prokka 1.7 (Seemann, 2014) (RRID:SCR_014732). OrthoMCL v2.0.9 (Li et al., 2003) (RRID:SCR_007839) was used to find orthologous genes in these genomes.
Core genome alignment of G. vaginalis
Genomes were annotated with Prokka 1.7 (Seemann, 2014) (RRID:SCR_014732). OrthoMCL v2.0.9 (RRID:SCR_007839) grouped genes into orthologous groups (Li et al., 2003). Genes were filtered to include only genes present in one copy in every genome. Individual genes (n = 537) were aligned with TranslatorX (RRID:SCR_014733) and MAFFT v7.130b (Abascal et al., 2010; Katoh and Standley, 2014) (RRID:SCR_011811) and concatenated [https://github.com/tatumdmortimer/core-genome-alignn class="Species">ment].
Phylogenetic analyses
Maximum likelihood phylogenetic trees were inferred using RAxML 8.0.6 (Stamatakis, 2014) (RRID:SCR_006086). Bootstrap replicates (number determined by autoMR convergence criteria) were applied to the tree with the highest likelihood of twenty using the GTRGAMMA substitution model. We used SplitsTree4 (Huson and Bryant, 2006) (RRID:SCR_014734) to create networks of pSST1, the chromosome in n class="Species">S. saprophyticus, and the core genome of G. vaginalis.
Recombination
Recombination in a whole genome alignment of n class="Species">S. saprophyticus isolates and a core genome alignment of modern G. vaginalis isolates was assessed using BratNextGen (Marttinen, 2012). For both S. saprophyticus and G. vaginalis analyses, one hundred permutations were performed to calculate the significance (p<0.05) of recombinant fragments (plots created with Circos [Krzywinski, 2009]). Recombination was also measured for the pSST1 plasmid alignment using Phi (Bruen et al., 2006), Max χ2 (Smith, 1992), and NSS (Jakobsen and Easteal, 1996) implemented in PhiPack; results of these tests were all significant with p-values of 5×10−15, 0, and 0, respectively.
Variant annotation
We used SNP-sites (Page et al., 2016) to convert the alignn class="Species">ment of S. saprophyticus isolates to a multi-sample VCF. SnpEff (Cingolani, 2012) (RRID:SCR_005191) was used to annotate variants in this VCF as synonymous, non-synonymous, or intergenic.
Analysis of temporal structure
To determine whether there was sufficient temporal structure in the S. saprophyticus phylogeny to estimate evolutionary rates, we performed a regression of root-to-tip genetic distances against year of sampling using TempEst v 1.4 (Rambn class="Chemical">aut et al., 2016). We also attempted to estimate evolutionary rates using BEAST v1.8 (Drummond et al., 2012) (RRID:SCR_010228). Results of both these analyses (Appendix) revealed a lack of temporal structure such that rate (and date) estimates are unreliable.
Analysis of population structure
We used a Bayesian tree sampling method implemented in BaTS (vBETA2) (Parker et al., 2008) to determine the significance of phylogenetic clustering and population structure in our n class="Species">S. saprophyticus data. A distribution of phylogenies was generated using BEAST v1.8 (Drummond et al., 2012) (RRID:SCR_010228) with GTR+Γ substitution model, a strict molecular clock, and a constant population size. Markov chains were run in duplicate for 10 million generations each with sampling every 1000 generations, and the first 1 million generations were discarded as burn-in.
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the n class="Chemical">authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "An ancient emerging infection as a cn class="Chemical">ause of maternal sepsis in Late Byzantine Troy" for consideration by eLife. Your article has been favorably evaluated by Richard Losick as the Senior Editor and three reviewers, including George H Perry (Reviewer #1) - who is a member of our Board of Reviewing Editors. – and Laura Weyrich (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.Summary:Devault et al. use Dn class="Chemical">NA sequencing to describe, for the first time, urogenital infections in a woman buried ~800 years ago in Troy. The authors compare DNA present within calcified nodules (which are also analyzed morphologically) that were discovered in association with human skeletal remains to environmental, skeletal, and modern data to determine the source of the nodules and identify two bacterial species preserved within. The authors hypothesize that the infection may have been linked with pregnancy, due to some Y chromosome reads preserved within the nodule. The preservation of the Staphylococcus saprophyticus DNA, especially, inside the calcified nodule is astounding, providing opportunity for reconstructing a complete, high-quality genome sequence.
Essential revisions:The reviewers have raised and discussed several major concerns that must be adequately n class="Disease">addressed before your paper can be considered for acceptance. These necessary revisions all involve re-analysis of existing data and associated modifications to the presentation and interpretation of the results in the text, rather than generation of new data. Thus, we believe that these revisions can reasonably be completed within two months.
1) There is insufficient evidence presented in the manuscript to support the authors' conclusions about catching the "emergence" of a pathogenic n class="Species">human S. saprophyticus strain, or to determine whether this pathogen was passed from domestic animals to humans or vice versa, or even to exclude other potential scenarios in which both populations originally acquired the pathogen from the same source. This claim is made with a single cattle isolate that falls basal to the strain identified in Troy; however, other animal isolates fall within the tree. In the absence of additional cattle strains and wider sampling in general, the results should be treated with appropriate caution and interpretations tempered. Divergence date estimates generated with the field's current gold standard (BEAST) should be provided in the main text. If these estimates cannot be obtained for a technical reason or limitation then this needs to be clarified in the main text, not only the supplementary information.
1A) Note that an associated title change will likely be necessary.2) The hypothesis of a fetal origin for the calcified nodule is intriguing, but needs further exploration. The genetic sex analyses in the paper are not rigorous to current standards, and leave more uncertainty than necessary. The authors should apply the Skoglund et al. 2013 J. Arch. Sci. (Accurate sex identification of ancient n class="Species">human remains using DNA shotgun sequencing) analysis to the data from both the nodule and the ulna, and then illustrate this result with a main figure, since this differential diagnosis is a key finding of the paper. The Y chromosome damage patterns presented in the supplementary information are not sufficiently robust (due to the limited number of reads) to determine if the DNA is ancient. The number of nuclear genome SNP sites covered by one or more reads in both the ulna and the nodule is likely insufficient for an assessment of relatedness. Please check and either perform the analysis or expand on the discussion of limitations in the manuscript.
3) The authors n class="Species">mention that extraction blank control samples were processed for enrichment, but the metagenome data for these samples are not shown. These data should be provided to further demonstrate authenticity of the results. For example, the authors could strengthen the argument that the Y chromosome and S. saprophyticus reads are endogenous to the sample and not the lab environment. With the high proportions of DNA mapping to S. saprophyticus, it is unlikely that these data reflect contaminants; however, the laboratory extraction procedure was extensive. Showing and discussing the extraction metagenome throughout the text would help to alleviate this concern.
4) The quality control assessment of whether the nodules belonged to the female whose skeletal remains they were associated has a logical flaw (easily corrected). n class="Species">Human mtDNA genome sequences obtained from both the nodule and from aDNA extracted from the individual's ulna were identical. The authors interpret this result as evidence that the nodule and the ulna come from the same individual. Of course, the authors elsewhere in the manuscript suggest a placental origin for the nodules, via chorioamnionitis with a male fetus. This underscores the inability for mtDNA to facilitate confident individual identifications, since it is inherited maternally without recombination. Thus, the authors should be more circumspect in their identification based on this evidence. If this is DNA from a male fetus, then all evidence is indeed pointing to an association with the female whose skeletal remains were recovered, but please ensure precision when discussing this evidence.
5) There is currently no description provided of the osteological materials and the methods and results leading to the conclusion that the skeletal remains are from a likely female individual, of ~30 years of age. This needs to be remedied, as these designations are critical for the interpretation.6) All data must be provided via appropriate repositories, with clear descriptions of accession numbers for each dataset. Sequence reads from all experin class="Species">ments (and ideally intermediate processing files, and alignments, e.g., through Dryad), including the metagenomic datasets and the raw data from the bone (i.e., not simply the genome sequences), the SEM data, XRD data, etc., associated with this work all need to be thoroughly curated and made available.
Essential revisions:The reviewers have raised and discussed several major concerns that must be adequately n class="Disease">addressed before your paper can be considered for acceptance. These necessary revisions all involve re-analysis of existing data and associated modifications to the presentation and interpretation of the results in the text, rather than generation of new data. Thus, we believe that these revisions can reasonably be completed within two months.
1) There is insufficient evidence presented in the manuscript to support the authors' conclusions about catching the "emergence" of a pathogenic n class="Species">human S. saprophyticus strain, or to determine whether this pathogen was passed from domestic animals to humans or vice versa, or even to exclude other potential scenarios in which both populations originally acquired the pathogen from the same source. This claim is made with a single cattle isolate that falls basal to the strain identified in Troy; however, other animal isolates fall within the tree. In the absence of additional cattle strains and wider sampling in general, the results should be treated with appropriate caution and interpretations tempered. Divergence date estimates generated with the field's current gold standard (BEAST) should be provided in the main text. If these estimates cannot be obtained for a technical reason or limitation then this needs to be clarified in the main text, not only the supplementary information.
1A) Note that an associated title change will likely be necessary.We appreciate the reviewers’ comments that we have insufficient evidence to identify the emergence of a n class="Species">human pathogenic lineage of S. saprophyticus. We have therefore removed all references to this concept and changed the title to “A molecular portrait of maternal sepsis from Byzantine Troy”.
We attempted to estimate the nucleotide substitution rate for S. saprophyticus but were unable to generate precise estimates due to a striking lack of temporal signal in the data. We have n class="Disease">added a section to the main text that describes these attempts and our hypotheses about potential reasons we failed to observe any temporal signal.
Upon re-review of the supplementary material, we decided that the rate analyses for Gardnerella vaginalis were based on a sample of insufficient size and we have removed this paragraph.2) The hypothesis of a fetal origin for the calcified nodule is intriguing, but needs further exploration. The genetic sex analyses in the paper are not rigorous to current standards, and leave more uncertainty than necessary. The authors should apply the Skoglund et al. 2013 J. Arch. Sci. (Accurate sex identification of ancient n class="Species">human remains using DNA shotgun sequencing) analysis to the data from both the nodule and the ulna, and then illustrate this result with a main figure, since this differential diagnosis is a key finding of the paper.
Our hypothesis about the origin of the nodules was not well described in the original manuscript: we don’t posit a fetal origin for the nodules, rather we hypothesize that they are of placental origin. Placental tissue is of both fetal and maternal origin, and the inflammatory response to infection of the placenta and fetal membranes involves cells from mother and fetus. Our observations of the nodules are consistent with their having an origin as placental abscesses consisting of bacterial cells, maternal inflammatory cells with a minority component of fetal inflammatory cells. We have an class="Species">mended the description in the main text to make this logic more explicit and clear to the reader.
To address comn class="Species">ments regarding appropriate sexing of the remains we have applied the methodology of Skoglund et al. This assigns a female sex to the nodules, with high certainty, reflecting a majority maternal origin/signal for the inflammatory cells in the abscess. We have added this detail to the main text and the supplementary material with an associated table. We don’t however think that a figure is necessary to convey this information and hope the reviewer concurs.
We also agree with the reviewer that there are insufficient data from nuclear SNPs for an assessn class="Species">ment of relatedness and thus feel that a figure would unduly draw attention to this aspect of the paper. We have modified the text accordingly (re: sexing).
“We attempted to sex the remains (i.e. calcified nodules, ulna and their associated blanks as well as the surrounding sediments and blanks) using the method developed and published by Skoglund et al. (2013). […] Importantly, blanks, also with low ren class="Disease">ad counts (between 1 and 35) are labelled as ‘consistent with XX’ however with 95% confidence intervals (at those levels) of 0.”
The Y chromosome damage patterns presented in the supplementary information are not sufficiently robust (due to the limited number of ren class="Disease">ads) to determine if the DNA is ancient. The number of nuclear genome SNP sites covered by one or more reads in both the ulna and the nodule is likely insufficient for an assessment of relatedness. Please check and either perform the analysis or expand on the discussion of limitations in the manuscript.
The number of reads mapping to the Y in our nodule libraries is low (884) but not (as we have shown) insignificant, when compared to all necessary controls (sedin class="Species">ment, blanks, etc.).
While the signal of deamination (as C —> T and G—> A) is present at the termini (-1 to – 14bp), it is indeed low. This is due to the fact that the majority of Y reads come from the more deeply sequenced, n class="Gene">UDG treated libraries.
In summary, we feel that the Y- chromosomal FLDs’ low but none-the-less present damage signal, their independent identification in the Skoglund et al. 2013 method (at the reviewers’ suggestion), and our Fisher’s exact test, suggest their authenticity more so than their identification as spurious low level contamination not found in equally sequenced bone or sedin class="Species">mentary remains. We hope the reviewers agree.
3) The authors n class="Species">mention that extraction blank control samples were processed for enrichment, but the metagenome data for these samples are not shown. These data should be provided to further demonstrate authenticity of the results. For example, the authors could strengthen the argument that the Y chromosome and S. saprophyticus reads are endogenous to the sample and not the lab environment. With the high proportions of DNA mapping to S. saprophyticus, it is unlikely that these data reflect contaminants; however, the laboratory extraction procedure was extensive. Showing and discussing the extraction metagenome throughout the text would help to alleviate this concern.
We apologize if this was not clear, there are a lot of data spread across various tables. The results the reviewer is requesting are indeed listed in Supplen class="Species">mentary file 1G. This table is (as quoted in the supplementary file) a “Summary of unique shotgun reads (from nodules, ulna, sediment and associated blanks) of minimum length 35bp and minimum mapping quality of 30 (or greater), mapping to Staphylococcus saprophyticus and Gardnerella vaginalis. Nodule pooled =Nod2-UDG + Nod1_1h-nonU + Nod2-nonU + Nod1_1h-UDG).”
As the reviewers will see, there are low level reads in the blanks that do map to n class="Species">S. saprophyticus (0-53 reads, the highest being the sediment) and G. vaginalis (0-9). In both cases, when compared with the nodules these represent minute fractions of what’s detected in the nodules and thus unlikely to be contamination. We did, in addition, perform metagenomic analysis on all reads found in the blanks. Reads (>100) that could be identified at the sequence level or the species/strain level from all blank extracts were removed from final files used in the PCA analysis.
4) The quality control assessment of whether the nodules belonged to the female whose skeletal remains they were associated has a logical flaw (easily corrected). n class="Species">Human mtDNA genome sequences obtained from both the nodule and from aDNA extracted from the individual's ulna were identical. The authors interpret this result as evidence that the nodule and the ulna come from the same individual. Of course, the authors elsewhere in the manuscript suggest a placental origin for the nodules, via chorioamnionitis with a male fetus. This underscores the inability for mtDNA to facilitate confident individual identifications, since it is inherited maternally without recombination. Thus, the authors should be more circumspect in their identification based on this evidence. If this is DNA from a male fetus, then all evidence is indeed pointing to an association with the female whose skeletal remains were recovered, but please ensure precision when discussing this evidence.
As noted in the response to question 2 above, we believe that the nodules are of both maternal and fetal origin. As the reviewers point out, mitochondrial sequences of maternal inflammatory cells, fetal inflammatory cells and maternal bone cells should be identical. Our finding of identical mitochondrial haplotypes in the ulna and nodules is consistent with this scenario.We feel that the most parsimonious explanation of our observation of identical haplotypes in the nodules and ulna is that we have enriched and sequenced the identical woman’s mitogenome, with a possible minority component from her fetus. However, there is no precise way, without n class="Chemical">autosomal SNPs, to be 100% certain. For this reason, we have modified the following sentence in the main text to read:
“The nodule and the ulna share the identical mitochondrial haplotype (Supplementary file 5), suggesting that they likely stem from the same individual or a maternal relative.”5) There is currently no description provided of the osteological materials and the methods and results leading to the conclusion that the skeletal remains are from a likely female individual, of ~30 years of age. This needs to be remedied, as these designations are critical for the interpretation.Thank you for pointing this out. We have added a detailed description of the osteological materials and methods used for sexing and age estimation of the woman at Troy’s remains (1st section of the supplement).6) All data must be provided via appropriate repositories, with clear descriptions of accession numbers for each dataset. Sequence reads from all experin class="Species">ments (and ideally intermediate processing files, and alignments, e.g., through Dryad), including the metagenomic datasets and the raw data from the bone (i.e., not simply the genome sequences), the SEM data, XRD data, etc., associated with this work all need to be thoroughly curated and made available.
The sequencing data have been deposited under the BioProject IDs PRJNA352403 and PRJNA352376. Other data supporting the paper are included as supplementary material.
Authors: Todd J Treangen; Sergey Koren; Daniel D Sommer; Bo Liu; Irina Astrovskaya; Brian Ondov; Aaron E Darling; Adam M Phillippy; Mihai Pop Journal: Genome Biol Date: 2013-01-15 Impact factor: 13.583
Authors: Michael D Harwich; Joao M Alves; Gregory A Buck; Jerome F Strauss; Jennifer L Patterson; Aminat T Oki; Philippe H Girerd; Kimberly K Jefferson Journal: BMC Genomics Date: 2010-06-11 Impact factor: 3.969
Authors: Lucy A Weinert; John J Welch; Marc A Suchard; Philippe Lemey; Andrew Rambaut; J Ross Fitzgerald Journal: Biol Lett Date: 2012-05-23 Impact factor: 3.703
Authors: Sasha K Ames; David A Hysom; Shea N Gardner; G Scott Lloyd; Maya B Gokhale; Jonathan E Allen Journal: Bioinformatics Date: 2013-07-04 Impact factor: 6.937
Authors: George S Long; Jennifer Klunk; Ana T Duggan; Madeline Tapson; Valentina Giuffra; Lavinia Gazzè; Antonio Fornaciari; Sebastian Duchene; Gino Fornaciari; Olivier Clermont; Erick Denamur; G Brian Golding; Hendrik Poinar Journal: Commun Biol Date: 2022-06-16
Authors: Tatum D Mortimer; Douglas S Annis; Mary B O'Neill; Lindsey L Bohr; Tracy M Smith; Hendrik N Poinar; Deane F Mosher; Caitlin S Pepperell Journal: mSphere Date: 2017-11-29 Impact factor: 4.389
Authors: Ana T Duggan; Jennifer Klunk; Ashleigh F Porter; Anna N Dhody; Robert Hicks; Geoffrey L Smith; Margaret Humphreys; Andrea M McCollum; Whitni B Davidson; Kimberly Wilkins; Yu Li; Amanda Burke; Hanna Polasky; Lowell Flanders; Debi Poinar; Amogelang R Raphenya; Tammy T Y Lau; Brian Alcock; Andrew G McArthur; G Brian Golding; Edward C Holmes; Hendrik N Poinar Journal: Genome Biol Date: 2020-07-20 Impact factor: 13.583
Authors: Allison K Guitor; Amogelang R Raphenya; Jennifer Klunk; Melanie Kuch; Brian Alcock; Michael G Surette; Andrew G McArthur; Hendrik N Poinar; Gerard D Wright Journal: Antimicrob Agents Chemother Date: 2019-12-20 Impact factor: 5.191