| Literature DB >> 30107783 |
Diana I Cruz-Dávalos1,2,3,4, María A Nieves-Colón5, Alexandra Sockell6, G David Poznik7, Hannes Schroeder8,9, Anne C Stone5,10, Carlos D Bustamante6,11, Anna-Sapfo Malaspinas12,13,14, María C Ávila-Arcos15.
Abstract
BACKGROUND: As most ancient biological samples have low levels of endogenous DNA, it is advantageous to enrich for specific genomic regions prior to sequencing. One approach-in-solution capture-enrichment-retrieves sequences of interest and reduces the fraction of microbial DNA. In this work, we implement a capture-enrichment approach targeting informative regions of the Y chromosome in six human archaeological remains excavated in the Caribbean and dated between 200 and 3000 years BP. We compare the recovery rate of Y-chromosome capture (YCC) alone, whole-genome capture followed by YCC (WGC + YCC) versus non-enriched (pre-capture) libraries.Entities:
Keywords: Ancient DNA; Capture-enrichment; Y chromosome
Mesh:
Substances:
Year: 2018 PMID: 30107783 PMCID: PMC6092841 DOI: 10.1186/s12864-018-4945-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Samples, methods, total number (total data) and average (down-sampled data) number of reads, and fold-enrichment (down-sampled data)
| Endogenous content (pre-capture) | STM1 | STM2 | PI174 | PI383 | PI435 | PI437 | ||
|---|---|---|---|---|---|---|---|---|
| 1.54% | 0.12% | 0.01% | 0.03% | 0.04% | 0.02% | |||
| Site | Saint Martin | Saint Martin | Paso del Indio | Paso del Indio | Paso del Indio | Paso del Indio | ||
| Methods | Extraction | Rohland et al., 2007 | Rohland et al., 2007 | Dabney et al., 2013 | Dabney et al., 2013 | Dabney et al., 2013 | Dabney et al., 2013 | |
| Library building | Meyer and Kircher, 2010 | Meyer and Kircher, 2010 | Meyer and Kircher, 2010 | Meyer and Kircher, 2010 | Meyer and Kircher, 2010 | Meyer and Kircher, 2010 | ||
| WGC | MYbaits (MYcroarray, Ann Arbor) | MYbaits (MYcroarray, Ann Arbor) | WISC (Carpenter et al., 2013) | WISC (Carpenter et al., 2013) | WISC (Carpenter et al., 2013) | WISC (Carpenter et al., 2013) | ||
| YCC | YCC (this study) | YCC (this study) | YCC (this study) | YCC (this study) | YCC (this study) | YCC (this study) | ||
| WGC + YCC | MYbaits + YCC | MYbaits + YCC | WISC + YCC | WISC + YCC | WISC + YCC | WISC + YCC | ||
| Total data | Pre-capture | Total reads | 34,025,874 | 14,973,474 | 9,173,100 | 2,795,632 | 4,617,394 | 9,986,135 |
| Mapping to chrY | 1384 | 301 | 4 | 3 | 6 | 6 | ||
| On-target | 924 | 197 | 1 | 2 | 3 | 2 | ||
| % of sequenced on-target | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | ||
| % of chrY on-target | 67% | 65% | 25% | 67% | 50% | 33% | ||
| YCC | Total reads |
| 41,884 |
|
|
|
| |
| Mapping to chrY | 16,430 | 2562 | 17 | 27 | 26 | 12 | ||
| On-target | 16,191 | 2541 | 17 | 27 | 25 | 12 | ||
| % of sequenced on-target | 23.54% | 6.07% | 0.01% | 0.03% | 0.06% | 0.02% | ||
| % of chrY on-target | 99% | 99% | 100% | 100% | 96% | 100% | ||
| WGC | Total reads | 14,973,474 | 29,884,294 | 10,052,798 | 10,265,430 | 12,363,127 | 9,132,306 | |
| Mapping to chrY | 6407 | 1925 | 5 | 8 | 57 | 7 | ||
| On-target | 2930 | 903 | 2 | 6 | 29 | 4 | ||
| % of sequenced on-target | 0.020% | 0.003% | 0.000% | 0.000% | 0.000% | 0.000% | ||
| % of chrY on-target | 46% | 47% | 40% | 75% | 51% | 57% | ||
| WGC + Y | Total reads | 98,540 |
| 17,212,495 | 14,557,575 | 10,181,433 | 11,356,277 | |
| Mapping to chrY | 4854 | 236 | 37 | 152 | 242 | 47 | ||
| On-target | 4643 | 222 | 37 | 150 | 235 | 45 | ||
| % of sequenced on-target | 4.71% | 0.72% | 0.00% | 0.00% | 0.00% | 0.00% | ||
| % of chrY on-target | 96% | 94% | 100% | 99% | 97% | 96% | ||
| Down-sampled data | Pre-capture | Total reads | 68,795 | 30,629 | 159,728 | 107,796 | 41,810 | 56,157 |
| Mapping to chrY | 2.8 | 0.2 | – | 0.2 | 0.1 | – | ||
| On-target | 2.3 | 0.2 | – | – | 0.1 | – | ||
| % of sequenced on-target | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | ||
| % of chrY on-target | 82% | 100% | – | – | 100% | – | ||
| YCC | Total reads | 68,795 | 30,629 | 159,728 | 107,796 | 41,810 | 56,157 | |
| Mapping to chrY | 16,430 | 1925 | 17 | 27 | 26 | 12 | ||
| On-target | 16,191.0 | 1909.8 | 17.0 | 27.0 | 25.0 | 12.0 | ||
| % of sequenced on-target | 23.54% | 6.24% | 0.01% | 0.03% | 0.06% | 0.02% | ||
| % of chrY on-target | 99% | 99% | 100% | 100% | 96% | 100% | ||
| WGC | Total reads | 68,795 | 30,629 | 159,728 | 107,796 | 41,810 | 56,157 | |
| Mapping to chrY | 38.8 | 2.7 | 0.2 | 0.1 | 0.2 | – | ||
| On-target | 16.8 | 1.5 | 0.1 | 0.1 | – | – | ||
| % of sequenced on-target | 0.02% | 0.00% | 0.00% | 0.00% | 0.00% | 0.00% | ||
| % of chrY on-target | 43% | 56% | 50% | 100% | 0% | – | ||
| WGC + Y | Total reads | 68,795 | 30,629 | 159,728 | 107,796 | 41,810 | 56,157 | |
| Mapping to chrY | 4414.0 | 236.0 | 17.1 | 82.5 | 111.9 | 25.5 | ||
| On-target | 4239.5 | 222.0 | 17.1 | 81.3 | 109.7 | 24.5 | ||
| % of sequenced on-target | 6.16% | 0.72% | 0.01% | 0.08% | 0.26% | 0.04% | ||
| % of chrY on-target | 96% | 94% | 100% | 99% | 98% | 96% | ||
| Condition 1 | Condition 2 | |||||||
| Fold-enrichment (down-sampled data) | YCC | Pre-capture | 7039.6 | 9549.0 | 17.0 | 27.0 | 250.0 | 12.0 |
| YCC | WGC | 963.8 | 1273.2 | 170.0 | 270.0 | 25.0 | 12.0 | |
| YCC | WGC + YCC | 3.8 | 8.6 | 1.0 | 0.3 | 0.2 | 0.5 | |
| WGC | Pre-capture | 7.3 | 7.5 | 0.1 | 0.1 | 0.0 | 0.0 | |
| WGC + YCC | Pre-capture | 1843.3 | 1110.0 | 17.1 | 81.3 | 1097.0 | 24.5 | |
| WGC + YCC | WGC | 252.4 | 148.0 | 171.0 | 813.0 | 109.7 | 24.5 | |
Ten replicates per library were obtained by down-sampling to the minimum number of retained reads within each sample (underlined in “Total data” rows). The “Mapping to chrY” section indicates the number of unique reads mapping to the Y-chromosome. “On-target” refers to the unique on-target reads. “%of sequenced on-target” and “% of chrY on-target” refer to the percentage of on-target reads with respect to the total sequenced reads and to the total reads mapping to the Y-chromosome, respectively. The fold-enrichments were calculated with the down-sampled data, by dividing the number of on-target reads in Condition 1 by the number of on-target reads in Condition 2; when the denominator was 0, we assigned the number of on-target reads in Condition 1
Fig. 1Experimental enrichment scheme. We have four different conditions: Pre-capture, YCC, WGC, and WGC + YCC. The pre-capture condition is our initial library preparation prior to any enrichment. The WGC is designed to target all autosomal and sex chromosomes. The Y-capture in the YCC and the WGC + YCC conditions targets ~ 10.3 Mb of Y-chromosome sequence
Fig. 2Endogenous DNA content in enriched libraries. Percentage of the unique retained reads that aligned to the human genome in (a) Saint Martin and (b) Puerto Rico samples. “STM” stands for Saint Martin and “PI” for Paso del Indio, Puerto Rico. The percentages in parentheses below the x-axis indicate the number of down-sampled reads per library. The error bars represent 95% confidence intervals of endogenous DNA content found in the samples across the 10 down-sampled replicates. Darker colors correspond to the proportion of the unique reads that aligned to the Y chromosome. Whole-genome enriched libraries have < 0.04% reads aligning to the Y chromosome
Fig. 3On- and off-target reads. We show the mean and the 95% confidence interval for 10 replicates. a Unique reads mapping to the Y-chromosome target regions. b Unique reads mapping to the Y chromosome but not to the targeted regions. For some libraries, no reads mapped to the Y chromosome, across the 10 replicates
Fig. 4Depth of coverage across the Y-chromosome. From top to bottom, rows depict the coverage levels for the pre-capture, YCC, WGC and WGC + YCC conditions. Red boxes represent the targeted regions. Each blue point represents sequencing coverage within a 1000-bp window, averaged across 10 subsampled replicates per sample per condition explaining depths of coverage below 1. To help with readability, we increased the opacity of the points in the PI383 column
Fig. 5Lengths of mapped reads. a Reads aligned to the nuclear genome. b On-target reads. c and (d) depict the length distributions of reads mapping to the whole genome for STM1 and STM2 samples, respectively. The length distribution was smoothed by fitting a polynomial curve to the observed frequencies; the ribbons correspond to 95% confidence intervals
Fig. 6Clonality. a Clonal reads mapping to the nuclear human genome. b Clonality of the reads mapping to the targeted regions. c Clonal reads mapping to the Y chromosome but not to the targeted regions. Error bars represent 95% confidence intervals across 10 subsampled replicates
Fig. 7Expected yield and on-target fold-enrichment. Dashed lines indicate the number of down-sampled reads. a and (b): Predicted median value and variance (across 100 bootstrap replicates) of the number of on-target reads, as a function of total sequenced reads. The points depict the observed numbers of on-target reads in the down-sampled libraries. c and (d): Expected enrichment of on-target reads versus number of sequenced reads for each condition and each sample
Numbers of Y-chromosome bases, Y-SNPs and haplogroups retrieved
| Sample | Condition | Positions recovered | Total SNPs | Ancestral SNPs | Derived SNPs | Haplogroup retrieved |
|---|---|---|---|---|---|---|
| STM1 | All | 2,205,331 | 12,061 | 11,625 | 436 | R1b-M343 |
| Pre-capture | 93,725 | 595 | 577 | 18 | R1b-M343 | |
| YCC | 1,372,731 | 8103 | 7805 | 298 | R1b-M343 | |
| WGC | 354,255 | 2414 | 2321 | 93 | R1b-M343 | |
| WGC + YCC | 493,085 | 2971 | 2871 | 100 | R1b-M343 | |
| STM2 | All | 428,846 | 2183 | 2091 | 92 | E1b1a1a1-M80 |
| Pre-capture | 19,994 | 109 | 100 | 9 | E1b1a1a1-M80 | |
| YCC | 220,732 | 1262 | 1215 | 47 | E1b1a1-M2 | |
| WGC | 109,411 | 760 | 726 | 34 | E1b1a1a1-M80 | |
| WGC + YCC | 219,152 | 114 | 110 | 4 | CT-M168 | |
| PI174 | All | 3224 | 19 | 18 | 1 | A1-V168 |
| Pre-capture | 129 | 0 | 0 | 0 | – | |
| YCC | 1993 | 12 | 11 | 1 | A1-V168 | |
| WGC | 147 | 2 | 2 | 0 | A1-V168 | |
| WGC + YCC | 1818 | 11 | 10 | 1 | A1-V168 | |
| PI383 | All | 7738 | 46 | 45 | 1 | P-M45 |
| Pre-capture | 146 | 0 | 0 | 0 | – | |
| YCC | 1809 | 13 | 12 | 1 | P-M45 | |
| WGC | 473 | 5 | 5 | 0 | P-M45 | |
| WGC + YCC | 6890 | 42 | 41 | 1 | P-M45 | |
| PI435 | All | 16,469 | 100 | 97 | 3 | BT-M42 |
| Pre-capture | 164 | 0 | 0 | 0 | – | |
| YCC | 1918 | 15 | 15 | 0 | BT-M42 | |
| WGC | 2632 | 18 | 18 | 0 | – | |
| WGC + YCC | 12,399 | 86 | 83 | 3 | BT-M42 | |
| PI437 | All | 3444 | 14 | 14 | 0 | – |
| Pre-capture | 103 | 0 | 0 | 0 | – | |
| YCC | 938 | 6 | 6 | 0 | – | |
| WGC | 296 | 0 | 0 | 0 | – | |
| WGC + YCC | 2320 | 11 | 11 | 0 | – |