| Literature DB >> 32778135 |
Susanna Sabin1, Alexander Herbig1, Åshild J Vågene1,2, Torbjörn Ahlström3, Gracijela Bozovic4, Caroline Arcini5, Denise Kühnert6, Kirsten I Bos7.
Abstract
BACKGROUND: Although tuberculosis accounts for the highest mortality from a bacterial infection on a global scale, questions persist regarding its origin. One hypothesis based on modern Mycobacterium tuberculosis complex (MTBC) genomes suggests their most recent common ancestor followed human migrations out of Africa approximately 70,000 years before present. However, studies using ancient genomes as calibration points have yielded much younger dates of less than 6000 years. Here, we aim to address this discrepancy through the analysis of the highest-coverage and highest-quality ancient MTBC genome available to date, reconstructed from a calcified lung nodule of Bishop Peder Winstrup of Lund (b. 1605-d. 1679).Entities:
Keywords: Ancient DNA; Metagenomics; Molecular dating; Mycobacterium tuberculosis; Tuberculosis
Mesh:
Year: 2020 PMID: 32778135 PMCID: PMC7418204 DOI: 10.1186/s13059-020-02112-1
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1CT image of Ranke complex. CT image of Peder Winstrup’s chest in a slightly angled axial plane with the short arrow showing a small calcified granuloma in the probable upper lobe of the collapsed right lung, and two approximately 5 mm calcifications in the right hilum together suggesting a Ranke complex and previous primary tuberculosis. The more lateral of the two hilar calcifications was extracted for further analysis. In addition, there are calcifications in the descending aorta proposing atherosclerosis (arrowhead)
Fig. 2Screening of sequencing data from LUND1 shows preservation of host and pathogen DNA. a Krona plots reflecting the metagenomic composition of the lung nodule. The majority of sequencing reads were aligned to Homo sapiens (n = 2,833,403), demonstrating extensive preservation of host DNA. A small portion of reads aligned to bacterial organisms, and 80% of these reads were assigned to the MTBC node (n = 1724). b Damage plots generated from sequencing reads mapped directly to a reconstructed MTBC ancestor genome [21], demonstrating a pattern characteristic of ancient DNA
Mapping statistics for LUND1 libraries
| Pre/post capture | Library treatment | Processed reads pre-mapping ( | Unique mapped reads, quality-filtered ( | Endogenous DNA (%) | Mean fold coverage | Mean fragment length (bp) | GC content (%) |
|---|---|---|---|---|---|---|---|
| Pre-capture | Non-UDG | 3,696,712 | 1458 | 0.045 | 0.018 | 54.31 | 63.89 |
| Post-capture | UDG | 59,091,507 | 9,482,901 | 45.652 | 141.5062 | 65.83 | 62.96 |
A comparison of the mapping statistics for the non-UDG screening library and UDG-treated MTBC enriched library of LUND1 when aligned to the MTBC ancestor genome [21]. For full EAGER output, see Additional File 2
Model comparison for full MTBC dataset
| Model | Marginal likelihood | Mean rate (95% HPD) | Mean rate variance (95% HPD) | Mean tree height (95% HPD) |
|---|---|---|---|---|
| BDSKY+UCLD | − 6125044.47176458 | 1.4488E−8 (9.4606E−9, 1.9632E−8) | 1.6881E−17 (5.4855E−18, 3.069E−17) | 3258.0478 (2189.5235, 4501.1384) |
| CC+UCLD | − 6126017.15694528 | 1.214E−8 (7.1934E−9, 1.6448E−8) | 1.2459E−17 (2.833E−18, 2.3969E−17) | 4172.1961 (2585.2349, 6119.744) |
| SKY+UCLD | − 6127733.35000634 | 1.2944E−8 (8.6149E−9, 1.7342E−8) | 1.3423E−17 (4.848E−18, 2.3869E−17) | 3650.4222 (2472.6434, 4992.0277) |
| CC+strict | − 6125541.68118691 | 1.1573E−8 (8.6397E−9, 1.4509E−8) | NA | 4453.1162 (3330.1516, 5619.3974) |
Marginal likelihood and parameter estimates from four models applied to the full MTBC dataset: constant coalescent with uncorrelated lognormal clock (CC+UCLD), constant coalescent with strict clock (CC+strict), Bayesian skyline coalescent with uncorrelated lognormal clock (SKY+UCLD), and birth-death skyline with uncorrelated lognormal clock (BDSKY+UCLD). Marginal likelihoods obtained through path sampling (see the “Methods” section)
Fig. 3MTBC maximum clade credibility tree. This MCC tree of mean heights was generated from the BDSKY+UCLD model as applied to the full MTBC dataset. Lineages are labeled on the right side. The ancient genomes are indicated by red asterisks and labeled on the side with their sample names. The outgroup is labeled as “M. canettii.” The 95% HPD intervals of the heights of nodes ancestral to each lineage are indicated as (lower boundary–upper boundary) in years before present. Ancestral nodes are highlighted by a circle colored to match the lineage label. The time scale is expressed as years before present, with the most recent time as 2010. The accompanying skyline plot can be found in Fig. S10 in Additional File 3
Model comparison for L4 dataset
| Model | Marginal likelihood | Mean rate (95% HPD) | Mean rate variance (95% HPD) | Mean tree height (95% HPD) | Origin (BDSKY only) |
|---|---|---|---|---|---|
| BDSKY+UCLD | − 6033864.2003 | 3.1885E−8 (1.9488E−8, 4.4007E−8) | 4.991E−17 (1.0674E−17, 8.9835E−17) | 1444.5416 (929.3966, 2083.7636) | NA |
| BDSKY+UCLD+origin | − 60327945.1483 | 3.4761E−8 (2.447E−8, 4.5029E−8) | 5.5123E−17 (1.9718E−17, 9.4555E−17) | 1319.2463 (952.8702, 1761.4382) | 2310.916 (1165.2155, 3372.9253) |
| CC+UCLD | − 6043356.1504 | 3.1068E−8 (1.988E−8, 4.1624E−8) | 4.3865E−17 (1.3291E−17, 7.806E−17) | 1569.0512 (1054.607, 2225.4758) | NA |
| SKY+UCLD | − 6034698.3620 | 2.8097E−8 (1.5329E−8, 3.9927E−8) | 3.7609E−17 (6.0593E−18, 7.1919E−17) | 1690.536 (1016.2712, 2646.5163) | NA |
| CC+strict | − 6034091.5119 | 2.9299E−8 (2.2173E−8, 3.6637E−8) | NA | 1567.544 (1186.1186, 1978.6488) | NA |
Selected parameter estimates from five models applied to the Lineage 4 dataset: constant coalescent with uncorrelated lognormal clock (CC+UCLD), constant coalescent with strict clock (CC+strict), Bayesian skyline coalescent with uncorrelated lognormal clock (SKY+UCLD), birth-death skyline with uncorrelated lognormal clock and tree conditioned on the root (BDSKY+UCLD), and birth-death skyline with uncorrelated lognormal clock with origin parameter estimate (BDSKY+UCLD+origin). Marginal likelihoods obtained through path sampling (see the “Methods” section)
Fig. 4L4 maximum clade credibility tree. This MCC tree of mean heights was generated from the BDSKY+UCLD model as applied to the L4 dataset. Sublineages are labeled on the right side. The ancient genomes are indicated by red asterisks and labeled with their sample name. The Lineage 2 outgroup, represented by L2_N0020, is labeled on the side. The 95% HPD interval for node height is displayed for ancestral nodes of each sublineage as (lower boundary–upper boundary) in years before present. Ancestral nodes are highlighted by a circle colored to match the sublineage label. The time scale is expressed as years before present, with the most recent time as 2010. The accompanying skyline plot can be found in Fig. S13 in Additional File 3
Fig. 5Substitution rate comparison across models and studies. Mean substitution rate per site per year for all models is expressed by a filled circle, with extended lines indicating the 95% HPD interval for that parameter. The Bos et al. [5] and Kay et al. [6] ranges are based on the reported rate values in each study. The Bos et al. [5] range is based on a full MTBC dataset, while the Kay et al. [6] range is based on an L4 dataset. All values presented here fall within one order of magnitude