Avery Davis Bell1,2, Curtis J Mello3,4, James Nemesh3,4, Sara A Brumbaugh3,4, Alec Wysoker3,4, Steven A McCarroll5,6. 1. Department of Genetics, Harvard Medical School, Boston, MA, USA. averydavisbell@gmail.com. 2. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. averydavisbell@gmail.com. 3. Department of Genetics, Harvard Medical School, Boston, MA, USA. 4. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. 5. Department of Genetics, Harvard Medical School, Boston, MA, USA. mccarroll@genetics.med.harvard.edu. 6. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA. mccarroll@genetics.med.harvard.edu.
Abstract
Meiosis, although essential for reproduction, is also variable and error-prone: rates of chromosome crossover vary among gametes, between the sexes, and among humans of the same sex, and chromosome missegregation leads to abnormal chromosome numbers (aneuploidy)1-8. To study diverse meiotic outcomes and how they covary across chromosomes, gametes and humans, we developed Sperm-seq, a way of simultaneously analysing the genomes of thousands of individual sperm. Here we analyse the genomes of 31,228 human gametes from 20 sperm donors, identifying 813,122 crossovers and 787 aneuploid chromosomes. Sperm donors had aneuploidy rates ranging from 0.01 to 0.05 aneuploidies per gamete; crossovers partially protected chromosomes from nondisjunction at the meiosis I cell division. Some chromosomes and donors underwent more-frequent nondisjunction during meiosis I, and others showed more meiosis II segregation failures. Sperm genomes also manifested many genomic anomalies that could not be explained by simple nondisjunction. Diverse recombination phenotypes-from crossover rates to crossover location and separation, a measure of crossover interference-covaried strongly across individuals and cells. Our results can be incorporated with earlier observations into a unified model in which a core mechanism, the variable physical compaction of meiotic chromosomes, generates interindividual and cell-to-cell variation in diverse meiotic phenotypes.
Meiosis, although essential for reproduction, is also variable and error-prone: rates of chromosome crossover vary among gametes, between the sexes, and among humans of the same sex, and chromosome missegregation leads to abnormal chromosome numbers (aneuploidy)1-8. To study diverse meiotic outcomes and how they covary across chromosomes, gametes and humans, we developed Sperm-seq, a way of simultaneously analysing the genomes of thousands of individual sperm. Here we analyse the genomes of 31,228 human gametes from 20 sperm donors, identifying 813,122 crossovers and 787 aneuploid chromosomes. Sperm donors had aneuploidy rates ranging from 0.01 to 0.05 aneuploidies per gamete; crossovers partially protected chromosomes from nondisjunction at the meiosis I cell division. Some chromosomes and donors underwent more-frequent nondisjunction during meiosis I, and others showed more meiosis II segregation failures. Sperm genomes also manifested many genomic anomalies that could not be explained by simple nondisjunction. Diverse recombination phenotypes-from crossover rates to crossover location and separation, a measure of crossover interference-covaried strongly across individuals and cells. Our results can be incorporated with earlier observations into a unified model in which a core mechanism, the variable physical compaction of meiotic chromosomes, generates interindividual and cell-to-cell variation in diverse meiotic phenotypes.
Meiosis, while critical for reproduction, is also variable and error-prone:
crossover rates vary among gametes, between the sexes, and among humans of the same
sex, and chromosome mis-segregation leads to aneuploidy[1-8]. To study diverse meiotic outcomes and how they co-vary across
chromosomes, gametes, and humans, we developed Sperm-seq, a way to simultaneously
sequence the genomes of thousands of individual sperm. We analyzed the genomes of
31,228 human gametes from 20 sperm donors, identifying 813,122 crossovers and 787
aneuploid chromosomes. Sperm donors had aneuploidy rates ranging from 0.01 to 0.05
aneuploidies per gamete; crossovers partially protected chromosomes from
nondisjunction at meiosis I. Some chromosomes and donors underwent more-frequent
non-disjunction during the meiosis I cell division, while other chromosomes and
donors showed more segregation failures during meiosis II; many genomic anomalies
that could not be explained by simple nondisjunction also occurred. Diverse
recombination phenotypes – from crossover rates to crossover location and
separation (a measure of crossover interference) – co-varied strongly across
individuals and cells. Our results can be incorporated with earlier observations
into a unified model in which a core mechanism, the variable physical compaction of
meiotic chromosomes, generates inter-individual and cell-to-cell variation in
diverse meiotic phenotypes.One way to learn about human meiosis has been to study how genomes are
inherited across generations. Genotype data are available for millions of people and
thousands of families; crossover locations are estimated from genomic segment
sharing among relatives and linkage-disequilibrium patterns in populations[2,4,7,9,10].
Although inheritance studies sample only the few gametes per individual that
generate offspring, such analyses have revealed that average crossover number and
crossover location associate with common variants at many genomic loci[3-6,11,12].Another powerful approach to studying meiosis is to directly visualize
meiotic processes in gametocytes, which has made it possible to see that homologous
chromosomes usually begin synapsis (their physical connection) near their
telomeres[13-15]; to observe double-strand breaks,
a subset of which progress to crossovers, by monitoring proteins that bind to such
breaks[16,17]; and to detect adverse meiotic outcomes,
such as chromosome mis-segregation[18,19]. Studies based on
such methods have revealed much cell-to-cell variation in features such as the
physical compaction of meiotic chromosomes[20,21].More recently, human meiotic phenotypes have been studied via genotyping or
sequencing up to 100 gametes from one person, demonstrating that crossovers and
aneuploidy can be ascertained from direct analysis of gamete genomes[22-26]. Despite these advances, it has not yet been possible to
measure multiple meiotic phenotypes genome-wide in many individual gametes from many
people.
Development of Sperm-seq
We developed a method (“Sperm-seq”) with which to sequence
thousands of sperm genomes quickly and simultaneously (Fig. 1). A key challenge in developing Sperm-seq was
to deliver thousands of molecularly accessible-but-intact sperm genomes to
individual nanoliter-scale droplets in solution. Tightly compacted[27] sperm genomes are difficult to
access enzymatically without loss of their DNA into solution; we accomplished
this by decondensing sperm nuclei using reagents that mimic the molecules with
which the egg gently unpacks the sperm pronucleus (Extended Data Fig. 1a-d). These
sperm DNA “florets” were then encapsulated into droplets together
with beads that delivered unique DNA barcodes for incorporation into each
sperm’s genomic DNA; we modified three technologies so as to do this
(Drop-seq[28], 10X
Chromium Single Cell DNA, and 10X GemCode[29], which was used to generate the data in this study)
(Extended Data Fig. 1e-f). We then developed, adapted, and integrated
computational methods for determining the chromosomal phase of each
donor’s sequence variants and for inferring the ploidy and crossovers of
each chromosome in each cell.
Fig. 1.
“Sperm-seq” overview.
Schematic of our droplet-based single-sperm sequencing method.
Extended Data Fig. 1.
Characterization of egg-mimic sperm preparation and optimization of
bead-based single-sperm sequencing.
a-c, Two-channel fluorescence plots showing the results
of droplet digital PCR (ddPCR) with input template noted in each title,
demonstrating that two loci (from different chromosomes) are detectable in
the same droplet far more often when sperm DNA florets (rather than purified
DNA) are used as input. Each point represents one droplet. Gray points in
the bottom left quadrant represent droplets in which neither template
molecule was detected; blue points in the top left quadrant represent
droplets in which the assay detected a template molecule for the locus on
chromosome 7; green droplets in the bottom right quadrant represent droplets
in which the assay detected a template molecule for the locus on chromosome
10; and brown point in the top right quadrant represent droplets in which
both loci were detected. With a high concentration of purified DNA as input
(a), comparatively fewer droplets contain both loci than
when untreated (b) or treated (c) sperm were used
as input. Sperm “florets” treated with the egg-mimicking
decondensation protocol had a much higher fraction of droplets containing
both loci than purified DNA (compare a and c,
right, high-input treated sperm) and had more-sensitive ascertainment and
cleaner results (quadrant separation) than untreated sperm (compare
b and c, left, low-input sperm and treated
sperm). The pink lines in (b) delineate the boundaries between
droplets categorized as negative or positive for each assay. d,
Optimization of sperm preparation: Characterization of the effect of
different lengths of 37°C incubation of sperm cells treated with
egg-mimicking decondensation reagents on how often the loci on chromosomes 7
and 10 were detected in the same ddPCR droplet. Y axis, the percentage of
molecules calculated to be linked to each other (i.e.
physically linked in input) for assays targeting chromosomes 7 and 10.
Extracted DNA (a negative control) gives the expected result of random
assortment of the two template molecules into droplets (first bar). The
45-minute heat treatment was used for all subsequent experiments in this
study. e and f, Distribution of sequence reads
across cell barcodes from droplet-based single-sperm sequencing. Each panel
shows the cumulative fraction (y-axis) of all reads from a sequencing run
coming from each read-number-ranked cell barcode; a sharp inflection point
delineates the barcodes with many reads from those with few reads. Points to
the left of the inflection point are the cell barcodes that associated with
many reads (i.e., beads that co-encapsulated with cells); the height of the
inflection point reflects the proportion of the sequence reads that come
from these barcodes. Only reads that mapped to the human genome (hg38) and
were not PCR duplicates are included. e, Data from an initial
adaptation of 10X Genomics’ GemCode linked reads system[29] where a small proportion
of the reads come from cell barcodes associated with putative cells.
f, Data from the final, implemented adaptation of 10X
Genomics’ GemCode linked reads system[29] for the same number of input sperm
nuclei as in e. Note that this x-axis includes five times fewer
barcodes than in (e).
We used this combination of molecular and computational approaches to
analyze 31,228 sperm cells from 20 sperm donors (974–2,274 gametes per
donor), sequencing a median of ~1% of the haploid genome of each cell (Extended Data Table 1). Deeper sequencing
allows detection of ~10% of a gamete’s genome.
Extended Data Table 1.
Sperm donor and single-sperm sequencing characteristics and results.
Donor
Ancestry*
Cells (number excluding cell and bead
doublets)
Reads per cell (median,
thousands)
Genome covered per cell (median,
percent)
Heterozygous SNPs in genome
(millions)
Unique heterozygous SNP alleles
observed per cell (median, thousands)
Crossovers observed (total,
thousands)
Crossovers per cell (mean)
Resolution of crossovers
(kb,median)
Autosomal aneuploidy events (percent
of cells)[†]
Sex chromosome aneuploidy events
(percent of cells)[†]
Overall
--
31,228[‡]
211[§]
1.0[∥]
--
24.6[§]
813[‡]
26.11[∥]
240[¶]
1.6[§]
0.9[§]
NC1
Eur.
982
284
1.4
1.95
31.6
26
26.31
189
1.5
0.6
NC2
Eur.
1,680
163
0.8
1.98
18.2
37
22.19
307
2.0
0.7
NC3
Eur.
1,289
190
0.9
1.94
21.5
36
28.13
260
1.7
0.7
NC4
Eur.
1,482
243
1.1
1.98
26.8
40
26.98
243
1.1
0.5
NC6
Afr. Am.
1,370
154
0.8
2.53
23.8
38
27.57
253
0.7
0.3
NC8
As.
1,663
304
1.5
1.81
30.9
45
26.98
229
3.1
0.5
NC9
As.
1,894
245
1.2
1.79
25.6
53
27.98
231
0.8
1.5
NC10
As.
1,154
224
1.1
1.82
23.3
29
24.99
257
1.5
0.3
NC11
Eur.
1,930
202
1.0
1.92
22.8
50
25.82
242
1.3
0.4
NC12
Eur.
2,145
179
0.9
1.91
20.6
51
23.76
270
1.2
1.7
NC13
Eur.
1,514
259
1.2
1.92
28.3
41
27.19
202
0.9
1.0
NC14
Eur.
1,336
296
1.4
1.92
32.4
36
26.65
175
2.5
1.2
NC15
Eur.
1,702
211
1.0
1.93
23.2
42
24.80
268
1.0
0.9
NC16
Eur.
1,785
241
1.2
1.92
26.9
42
23.78
227
2.2
1.3
NC17
Eur.
1,504
220
1.0
1.94
23.8
39
25.92
250
2.1
0.7
NC18
Eur.
1,589
170
0.8
1.93
18.4
42
26.48
317
1.2
0.6
NC22
Afr. Am.
1,693
195
0.9
2.53
29.7
44
25.96
205
1.4
0.7
NC25
Afr. Am.
2,274
175
0.8
2.47
25.8
62
27.31
211
2.8
1.8
NC26
Afr. Am., As.
974
120
0.6
2.55
18.0
26
26.67
355
1.3
0.4
NC27
As. (?)
1,268
267
1.3
1.96
29.2
34
26.80
199
1.7
0.6
As provided by sperm bank. Afr. Am., of African American
ancestry; Eur., of European ancestry; As., of Asian ancestry; (?),
conflicting ancestry information given.
These numbers are the total number of aneuploidy events divided
by the total number of cells multiplied by 100; cells can have more than
one event.
Sum across all cells from all sperm donors.
Median or mean across all individual cells from all sperm donors
(31,228 measurements summarized).
Median or mean of aggregate metrics across samples (20
measurements summarized).
Median across all crossovers (813,122 measurements
summarized).
Sperm-seq enabled inference of donors’ haplotypes along the full
length of every chromosome: alleles from the same parental chromosome tend to
appear in the same gametes, so the co-appearance patterns of alleles across many
sperm enabled alleles to be assembled into chromosome-length haplotypes (Extended Data Fig. 2a, Methods). In silico simulations and
comparisons to kilobase-scale haplotypes from population-based analyses
indicated that Sperm-seq assigned alleles to haplotypes with 97.5–100%
accuracy (Extended Data Fig. 2b,c, Supplementary Notes).
Extended Data Fig. 2.
Evaluation of chromosomal phasing and identification of cell
doublets.
a, Phasing strategy. Green and purple denote the
chromosomal phase of each allele (unknown before analysis). Each sperm cell
carries one parental haplotype (green or purple) except where a
recombination event separates consecutively observed SNPs (red
“X” in bottom sperm). Because alleles from the same haplotype
will tend to be observed in the same sperm cells, the haplotype arrangement
of the alleles can be assembled at whole-chromosome scale. b,
Evaluation of our phasing method using 1,000 simulated single-sperm genomes
(generated from two a priori known parental haplotypes and
sampled at various levels of coverage). Since cell doublets (which combine
two haploid genomes and potentially two haplotypes at any region) can in
principle undermine phasing inference, we included cell doublets in the
simulation (in proportions shown on the X axis, which bracket the observed
doublet rates). Each point shows the proportion of SNPs phased concordantly
with the correct (a priori known) haplotypes (Y axis) for
one simulation (five simulations were performed per proportion of cell
doublets-percentage of observed sites condition pair). c,
Relationship of phasing capability to number of cells analyzed. Data are as
in (b), but for different numbers of simulated cells. All
simulations had an among-cell mean of 1% of heterozygous sites observed.
d, A cell doublet: when two cells (here, sperm DNA florets)
are co-encapsulated in the same droplet, their genomic sequences will be
tagged with the same barcode; such events must be recognized computationally
and excluded from downstream analyses. e, Four example
chromosomes from a cell barcode associated with two sperm cells (a cell
doublet). Black lines: haplotypes; blue circles: observations of alleles,
shown on the haplotype from which they derive. Both parental haplotypes are
present across regions of chromosomes where the cells inherited different
haplotypes. f, Computational recognition of cell doublets in
Sperm-seq data (from an individual sperm donor, NC11). The proportion of
consecutively observed SNP alleles derived from different parental
haplotypes is used to identify cell doublets; this proportion is generally
small (arising from sparse crossovers, PCR/sequencing errors, and/or ambient
DNA) but is much higher when the analyzed sequence comes from a mixture of
two distinct haploid genomes. We use 21 of the 22 autosomes to calculate
this proportion, excluding the autosome with the highest such proportion
given the possibility that a chromosome is aneuploid. The dashed gray line
marks the inflection point beyond which sperm genomes are flagged as
potential doublets and excluded from downstream analysis. Red points
indicate barcodes with coverage of both the X and Y chromosome (potentially
X+Y cell doublets or XY aneuploid cells); black points indicate barcodes
with one sex chromosome detected (X or Y). The red (XY) cells below the
doublet threshold are XY aneuploid but appear to have just one copy of each
autosome.
The phased haplotypes determined by Sperm-seq allowed us to identify
cell “doublets” from the presence of both parental haplotypes at
loci on multiple chromosomes (Extended Data Fig.
2d-f, Methods). We also identified surprising “bead
doublets,” in which two beads’ barcodes reported identical
haplotypes genome-wide, through different SNPs, and thus appeared to have
captured the same gamete genome (Extended Data
Fig. 3a,b, Methods, Supplementary Methods). Bead
doublets were useful for evaluating the replicability of Sperm-seq data and
analyses (Extended Data Fig. 3c-e), which is usually impossible to do in
inherently destructive single-cell sequencing.
Extended Data Fig. 3.
Identification and use of “bead doublets.”
a, SNP alleles were inferred genome-wide (for each
sperm genome) by imputation from (i) the subset of alleles detected in each
cell and (ii) Sperm-seq-inferred parental haplotypes. For each pair of sperm
genomes (cell barcodes), the proportion of all SNPs at which they shared the
same imputed allele was estimated. A small but surprising number of such
pairwise comparisons (19 of 984,906 from the donor shown, NC14) indicate
essentially identical genomes (ascertained through different SNPs).
b, We hypothesize that this arises from a heretofore
undescribed scenario we call “bead doublets”, in which two
barcoded beads have co-encapsulated with the same gamete and whose barcodes
therefore tagged the same haploid genome. c, Random pairs of
cell barcodes (here 100 pairs selected from donor NC10) tend to interrogate
few of the same SNPs (left), and tend to detect the same parental haplotype
on average at the expected 50% of the genome (right). d,
“Bead doublet” barcode pairs (here 20 pairs from
donor NC10, who had the median number of bead doublets, left) also
interrogate few of the same SNPs, yet detect identical haplotypes throughout
the genome (right). Results were consistent across donors. e,
Use of “bead doublets” to characterize the concordance of
crossover inferences between distinct samplings of the same haploid genome
by different barcodes. The bead doublets (barcode pairs) were compared to
100 random barcode pairs per donor. Crossover inferences were classified as
“concordant” (overlapping, detected in both barcodes), as
“one SNP apart” (separated by just one SNP, detected in both
barcodes), as “near end of coverage” (within 15 heterozygous
SNPs of the end of SNP coverage at a telomere, where power to infer
crossovers is partial), or as discordant. Error bars (with small magnitude)
show binomial 95% confidence intervals for the number of crossovers per
category divided by number of crossovers total in both barcodes (32,714
crossovers total in 1,201 bead doublet pairs; 67,862 crossovers total in
2,000 random barcode pairs; some barcodes are in multiple bead doublet or
random barcode pairs).
Recombination rate in sperm donors, cells
We identified crossover (recombination) events in each cell as
transitions between the parental haplotypes we had inferred analytically (Methods). We identified 813,122 crossovers in
the 31,228 gamete genomes (Extended Data Table
1). Crossover locations were inferred with a median resolution of 240
kb, with 9,746 (1.2%) inferred within 10 kb (Extended Data Table 1, Supplementary Notes). Analysis of
bead doublets indicated high accuracy of crossover inferences (Extended Data Fig. 3e). Estimates of crossover rate
and location were robust to down-sampling to the same coverage in each cell
(Extended Data Fig. 4, Supplementary Methods).
Extended Data Fig. 4.
Numbers and locations of crossovers called from down-sampled data (equal
number of SNPs in each cell, randomly chosen).
To eliminate any potential effect of unequal sequence coverage
across donors and cells, down-sampling was used to create data sets with
equal coverage (numbers) of heterozygous SNP observations in each cell.
Crossovers were called from these random equally sized sets of SNPs from all
cells. a and b, Crossover number per cell globally
(a) and per chromosome (b) (785,476 total
autosomal crossovers called from down-sampled SNPs included, 30,778 cells
included, aneuploid chromosomes excluded). c, Density plots of
crossover location with crossover midpoints plotted and area scaled to be
equal to per-chromosome crossover rate. Gray rectangles mark centromeric
regions; coordinates are in hg38. d, Similar numbers of
crossovers were called from full data and equally down-sampled SNP data: we
performed correlation tests across cells for each donor and chromosome to
compare the number of crossovers called from all data to the number of
crossovers called from equal numbers of randomly down-sampled SNPs. The
histogram shows Pearson’s r values for all 460 (20
donors x 23 chromosomes [total number plus number for 22 autosomes]) tests
(n per test = 974–2,274 cells per donor as in
Extended Data Table 1, all
chromosome comparisons Pearson’s r > 0.83,
all two-sided p < 10−300).
E, Crossovers called from equally down-sampled SNP data
were in similar locations to those called from all data: we performed
correlation tests comparing crossover rate in 500 kb bins (cM/500 kb) from
all data vs. equally down-sampled SNP data for each donor and chromosome.
The histogram shows Pearson’s r values for all 460
(20 donors x 23 chromosomes [genome-wide rate plus rate for 22 autosomes])
tests (n per test = number of 500 kb bins per chromosome
[genome-wide: 5,739, chromosomes 1 through 22: 497, 484, 396, 380, 363, 341,
318, 290, 276, 267, 270, 266, 228, 214, 203, 180, 166, 160, 117, 128, 93,
101], all chromosome comparisons Pearson’s r
> 0.87, all two-sided p <
10−300 ).
The 20 sperm donors’ recombination rates ranged from
22.2–28.1 crossovers per cell, consistent with estimates from other
methods[3,5,6,10-12,24,26], though with
far more precision at the individual-donor level (95% confidence intervals of
22.0–22.4 to 27.9–28.4 crossovers per cell), due to the large
number of gametes analyzed per donor (Extended
Data Table 1, Extended Data Fig.
5a). Individuals with higher global crossover rates had more
crossovers on average on each chromosome (Extended
Data Fig. 5b). We generated genetic maps for each of the donors from
their 25,839–62,110 observed crossovers; these maps were broadly
concordant with a family-derived paternal genetic map[6] (Extended
Data Fig. 5c,d; Supplementary Notes and Supplementary
Methods).
Extended Data Fig. 5.
Inter-individual and inter-cell recombination rate from single-sperm
sequencing.
a, Density plot showing per-cell number of autosomal
crossovers for all 31,228 cells (813,122 total autosomal crossovers) from 20
sperm donors (per-donor cell and crossover numbers as in Extended Data Table 1; aneuploid chromosomes
were excluded from crossover analysis). Colors represent a donor’s
mean crossover rate (crossovers per cell) from low (blue) to high (red).
This same mean recombination rate-derived color scheme is used for donors in
all figures. Recombination rate differs among donors (n =
20, Kruskal–Wallis chi-squared = 3,665, df =19,
p < 10−300). b,
Per-chromosome crossover number in each of the 20 sperm donors (data as in
(a) but shown for individual chromosomes). c,
Per-chromosome genetic map lengths for: (i) each of the 20 sperm donors, as
inferred from Sperm-seq data (colors from blue to red reflect donors’
individual crossover rates as described above); (ii) a male average, as
estimated from pedigrees by deCODE[6] (yellow triangles); (iii) a population average
(including female meioses, which have more crossovers), as estimated from
HapMap data[7] (yellow
circles). The deCODE genetic maps stop 2.5 Mb from the ends of SNP coverage.
d, Physical vs. genetic distances (for individualized sperm
donor genetic maps and deCODE’s paternal genetic map) plotted at 500
kb intervals (hg38). Gray boxes denote centromeric regions (or centromeres
and acrocentric arms). Sperm-seq maps are broadly concordant with deCODE
maps (correlation test results in Supplementary Notes) except at
subtelomeric regions not included in deCODE’s map.
Much more variation was present at the single-cell level: cells
routinely harbored 17 to 37 crossovers (1st and 99th
percentiles, median across donors), with a standard deviation of 4.23 across
cells (median across donors), vs. a standard deviation of 1.53 across
donors’ crossover rates. Among gametes from the same donor, gametes with
fewer crossovers in half of their genome tended to have fewer crossovers in the
other half of their genome (Pearson’s r = 0.09,
two-sided p = 8 × 10−54 with all
gametes from all donors combined after within-donor normalization; Supplementary Notes).
This relationship, predicted by earlier observations in families[5] and spermatocytes[21], suggests that crossover
number on each chromosome is partly shaped by factors that act nucleus-wide.
Crossover location and interference
All 20 donors shared a tendency to concentrate their crossovers in the
same regions of the genome, with large concentrations of crossovers in distal
regions, as expected from earlier analyses of families[4,6,9,11,30], and more
modest shared enrichments in many centromere-proximal regions (Fig. 2a, Extended Data
Fig. 6). Guided by these empirical patterns, we divided the genome
into “crossover zones,” each bounded by local minima in crossover
density (Extended Data Fig. 6b, Supplementary Methods).
These zones are much larger-scale than fine-scale-sequence-driven crossover
hotspots[7,31-33], which the spatial resolution of most crossover
inferences was not well suited for analyzing.
Fig 2.
Variation in crossover positioning and crossover separation
(interference).
Color indicates crossover rate of donor or cell (blue: low, red: high).
a, Crossover location density plots for each donor
(n = 20). Dashed gray vertical lines: crossover zone
boundaries. b-e, Crossover positioning and separation
(interference) on chromosomes with two crossovers. b-c,
Inter-individual variation among n = 20 sperm donors. Error
bars: 95% confidence intervals. b, Left, per-cell proportion of
crossovers in the most distal crossover zones (Kruskal–Wallis chi-squared
= 1,034, df = 19, p = 2 ×
10−207). Right, mean crossover rate (x axis) vs. the
proportion of all crossovers (on two-crossover chromosomes) occurring in distal
zones (y axis, total proportion) (Pearson’s r =
−0.95, two-sided p = 8 ×
10−11). c, Left, density plot of separation
between consecutive crossovers (Kruskal–Wallis chi-squared = 1,792,
df = 19, p <
10−300). Right, mean crossover rate (x axis) vs. median
crossover separation (y axis) on two-crossover chromosomes (Pearson’s
r = −0.95, two-sided p = 7 ×
10−11). d-e, Among-cell
covariation of crossover rate with distal zone use (d) or crossover
interference (e). Phenotypes are analyzed as percentiles relative
to sperm from the same donor. Boxplots: midpoints, medians; boxes,
25th and 75th percentiles; whiskers, minima and
maxima. d, Single-cell distal-zone use (the proportion of
crossovers on two-crossover chromosomes that are in the most distal zones) vs.
crossover rate (n cells per decile = 3,152, 3,080, 3,101 for
first, fifth, and tenth deciles, respectively; Mann–Whitney
W = 5,271,934.5, two-sided p = 2 ×
10−9 between first and tenth deciles.) e,
Single-cell crossover-separation (the median of all fractions of a chromosome
separating consecutive two-crossover chromosome crossovers in each cell) vs.
crossover rate (Mann–Whitney W = 148,548,161, two-sided
p = 3 × 10−53 between first
[n = 11,658] and tenth [n = 23,154]
deciles; all inter-crossover separations used in test).
Extended Data Fig. 6.
Distributions of crossover locations along chromosomes (in
“crossover zones”).
a, Each donor’s crossover locations are plotted
as a colored line; color indicates the donor’s overall crossover rate
(blue: low, red: high); gray boxes show the locations of centromeres (or,
for acrocentric chromosomes, centromeres and p arms). The
midpoint between the SNPs bounding each inferred crossover was used as the
position for each crossover in all analyses. To combine data across
chromosomes, crossover locations (density plot) are shown on
“meta-chromosomes” in which crossover locations are normalized
to the length of the chromosome or arm on which they occurred. For
acrocentric chromosomes, only the q arm was considered; for
non-acrocentric chromosomes, the p and q
arms were afforded space based on the proportion of the non-acrocentric
genome (in bp) they comprise, with the centromere placed at the summed
p arms’ proportion of bp of these chromosomes.
Crossover locations were first converted to the proportion of the arm at
which they fall, then these positions normalized to the genome-wide
p or q arm proportion. b,
Identification of chromosomal zones of recombination use (“crossover
zones”) from all donors’ crossovers for 22 autosomes. Density
plots of crossover location for all sperm donors’ total 813,122
crossovers (aneuploid chromosomes excluded; crossover location is the
midpoint between SNPs bounding crossovers) along autosomes (hg38) are shown.
Crossover zones (bounded by local minima of crossover density) are shown by
alternating shades of gray. Diagonally-hatched rectangles indicate
centromeres (or centromeres and acrocentric arms).
Intriguingly, the crossover zones with the most variable usage across
people were all adjacent to centromeres; individuals with high recombination
rates used these zones much more frequently (Fig.
2a, Extended Data Fig. 6a; with
simulated equal SNP coverage, Extended Data Fig.
4c,e). The relative usage of
distal and proximal zones varied greatly among donors and was correlated with
donors’ recombination rates (Extended Data
Fig. 7). These results were robust to alternative definitions of
“distal” vs. “proximal” (Extended Data Fig. 7c, Supplementary Notes).
Extended Data Fig. 7.
Crossover placement in end zones, and crossover separation, vary in ways
that correlate with crossover rate – among sperm donors and among
individual gametes.
Analyses are shown by donor (a-h,
n = 20 sperm donors) or by individual gamete
(i-j, n = 31,228 gametes). In
a-h, the left panels show the phenotype distributions for
individual donors, and the right panels show the relationship to the
donors’ crossover rates. To control for the effect of the number of
crossovers, the analyses in panels c, d, and g-j
use “two-crossover chromosomes” – chromosomes on which
exactly two crossovers occurred. For scatter plots (a-h,
right), all x axes show mean crossover rate and all error bars are 95%
confidence intervals (y axes are described per panel). a and
b, The proportion of crossovers falling in the most distal
chromosome crossover zones (a) and crossover separation
(b) – a readout of crossover interference, the
distance between consecutive crossovers (Mb) – vary among 20 sperm
donors (left panels; proportion of crossovers in end per cell distributions
among-donor Kruskal–Wallis chi-squared = 2,334, df =
19, p < 10−300; all distances
between consecutive crossovers among-donor Kruskal–Wallis chi-squared
= 3,309, df = 19, p <
10−300). Right panels show both properties (y axes,
total proportion of crossovers in distal zones and median crossover
separation, respectively) vs. donor’s crossover rate (Correlation
results for 20 sperm donors: proportion of all crossovers across cells in
distal zones Pearson’s r = −0.95, two-sided
p = 2 × 10−10;
Pearson’s r = −0.96, two-sided
p = 1 × 10−11).
c, An alternative method for the proportion of crossovers
in the distal regions of chromosomes: proportion of crossovers in the distal
50% of chromosome arms varies across donors (left, among-donor
Kruskal–Wallis chi-squared = 2,209, df = 19,
p < 10−300) and negatively
correlates with recombination rate (right, Pearson’s
r = −0.92, two-sided p = 2
× 10−8; y axis shows actual proportion of
crossovers in distal 50%). d, As in (c), but with
proportion of crossovers from two-crossover chromosomes occurring in the
distal 50% of chromosome arms. Left, among-donor Kruskal–Wallis
chi-squared = 1,058, df = 19, p = 2
× 10−212; right, correlation with recombination
rate Pearson’s r = −0.93, two-sided
p = 4 × 10-9. e, as in
(b) but for consecutive crossovers on the
q arm of the chromosome. Left, among-donor
Kruskal–Wallis chi-squared = 346, df = 19,
p = 7 × 10−62; right,
correlation with recombination rate Pearson’s r =
−0.90, two-sided p = 5 × 10-8.
f, as in (b) but for consecutive crossovers on
opposite chromosome arms (i.e. that span the centromere).
Left, among-donor Kruskal–Wallis chi-squared = 1,554,
df = 19, p = 1 <
10−300; right, correlation with recombination rate
Pearson’s r = −0.96, two-sided
p = 3 × 10-11. g, as in
(e) but for distances between consecutive crossovers on
two-crossover chromosomes. Left, among-donor Kruskal–Wallis
chi-squared = 181, df = 19, p = 2 ×
10−28; right, correlation with recombination rate
Pearson’s r = −0.88, two-sided
p = 3 × 10-7. h, as in
(f) but for distances between consecutive crossovers on
two-crossover chromosomes. Left, among-donor Kruskal–Wallis
chi-squared = 930, df = 19, p = 5 ×
10−185; right, correlation with recombination rate
Pearson’s r = −0.92, two-sided
p = 1 × 10-8. i,
j, Boxplots show medians and interquartile ranges with
whiskers extending to 1.5 times the interquartile range from the box. Each
point is a cell. i, Within-donor percentile of proportion of
crossovers from two-crossover chromosomes falling in distal zones plotted
vs. crossover rate decile. Groups are deciles of crossover rate normalized
by converting each cell’s crossover count to a percentile
within-donor (All cells from all donors shown together, n
cells in deciles = 3,152, 3,122, 3,276, 3,067, 3,080, 3,073, 3,135, 3,132,
3,090, 3,101, respectively [31,228 total]). Because the initial data is
proportions with small denominators, an integer effect is evident as pileups
at certain values. j, Crossover interference from two-crossover
chromosomes (median consecutive crossover separation per cell shown). Each
point represents the median of all percentile-expressed distances between
crossovers from all two-crossover chromosomes in one cell (percentile taken
within-chromosome), groupings and ns as in
(i).
Positive crossover interference causes crossovers in the same meiosis to
be further apart than they would be if crossovers were independent
events[26,30,34,35]. The effect
of crossover interference was visible in each of the 20 sperm donors (Extended Data Fig. 8, Supplementary Methods). Crossover
separation varied greatly among sperm donors and correlated inversely with
recombination rate (Extended Data Fig.
7b), results that were robust to chromosome composition and that applied
similarly to same-arm and opposite-arm crossover pairs (Extended Data Fig. 7e,f, Supplementary
Notes).
Extended Data Fig. 8.
Crossover interference in individual sperm donors and on
chromosomes.
a, Solid lines show density plots (scaled by
donor’s crossover rate) of the observed distance (separation) between
consecutive crossovers as measured in the proportion of the chromosome
separating them (left) and in genomic (Mb) distance (right), one line per
donor (n = 20). Dashed lines show the distance between
consecutive crossovers when crossover locations are permuted randomly across
cells to remove the effect of crossover interference. b, The
median of observed distances between consecutive crossovers for one donor
(NC18, 10th lowest recombination rate of 20 donors; blue dashed
line) is shown with a histogram of the medians of n =
10,000 among-cell crossover permutations (both permutation one-sided
ps < 0.0001). Units, proportion of the
chromosome (left) and genomic (Mb) distance (right). c,
Crossover separation on example chromosomes; plots and ns
are as in (b). (Permutation one-sided p
< 0.0001 for all chromosomes in all sperm donors except occasionally
chromosome 21, where especially few double crossovers occur).
d, Median distances between donor NC18’s consecutive
crossovers for each autosome for all inter-crossover distances (top) and
inter-crossover distances only from chromosomes with two crossovers
(bottom). Units are proportion of the chromosome (left) and genomic (Mb)
distance (right). e, Schematic: analyzing crossover
interference in individualized genetic distance (one 20 cM window shown)
using a donor’s own recombination map. f, When
parameterized using each donor’s own genetic map, sperm
donors’ crossover interference profiles across multiple genetic
distance windows (as shown in e) do not differ (n
= 20 sperm donors, Kruskal–Wallis chi-squared = 0.22,
df = 19, p = 1 using 20 estimates [cM
distances] for each of 20 donors). Error bars, binomial 95% confidence
intervals on proportion of cells with a second crossover in the window
given. This suggests that inter-individual variation in crossover
interference, while substantial when measured in base pairs, is negligible
when measured in donor-specific genetic distance, pointing to a shared
influence upon crossover interference and crossover rate.
The extremely strong correlations of donors’ crossover rates with
crossover locations and interference could arise from an underlying biological
factor that coordinates these phenotypes, or could arise trivially from the fact
that chromosomes with more crossovers would also tend to have crossovers more
closely spaced and in more regions. To distinguish between these possibilities,
we focused on data from the 180,738 chromosomes with exactly two crossovers
(here called “two-crossover chromosomes”; Supplementary Notes). Even in this
two-crossover chromosome analysis, distal-zone usage (Fig. 2b) and crossover separation (Fig. 2c) correlated strongly and negatively with
genome-wide recombination rate (additional control analyses described in Supplementary Notes and
Extended Data Fig. 7d,g,h). These
relationships indicate that a donor’s crossover-location and
crossover-spacing phenotypes reflect underlying biological factors that vary
from person to person, as opposed to resulting indirectly from the number of
crossovers on a chromosome.To test whether this co-variation of diverse meiotic phenotypes also
governs variation at the single-gamete level, we asked whether cells with more
crossovers than the average for their donor also exhibit the same kinds of
crossover-spacing and crossover-location phenotypes that donors with high
crossover rates do (Supplementary Methods). Indeed, two-crossover chromosomes from cells
with more crossovers tended to have closer crossover spacing and increased
relative use of non-distal zones (Fig.
2d,e, Extended Data Figure 7i,j; unnormalized results in Supplementary Notes). This result
indicates that the correlated meiotic-outcome biases that distinguish people
from one another also distinguish the gametes within each individual
(Discussion).
Chromosome and sperm donor aneuploidy
Aneuploidy generally arises from a chromosome mis-segregation that
yields two aneuploid cells: one in which that chromosome is absent (a loss), and
one in which it is present in two copies (a gain). Among the 31,228 gametes, we
found 787 whole-chromosome aneuploidies and 133 chromosome arm-scale gains and
losses (2.5% and 0.4% of cells, respectively, Fig.
3a, Methods). All chromosomes
and sperm donors were affected. The sex chromosomes and acrocentric chromosomes
had the highest rates of aneuploidy, consistent with fluorescence in
situ hybridization analysis-based estimates[18,19] (Fig. 3b).
Fig. 3.
Aneuploidy in sperm from 20 sperm donors.
a, Example chromosomal ploidy analyses. Thick dark gray
line: DNA copy number measurement (normalized sequence coverage in 1 Mb bins);
blue (haplotype 1) and yellow (haplotype 2) vertical lines: observed
heterozygous SNP alleles, plotted with 90% transparency; gray vertical boxes:
centromeres (hg38). b-e, Frequencies (number of events divided by
number of cells) of various aneuploidy categories. n = 23
chromosomes (b, d) and n = 20 donors
(c, e). Error bars are 95% binomial confidence
intervals. b, Frequencies of whole-chromosome losses (x axis) vs.
gains (y axis) for each chromosome (excluding XY Pearson’s
r = 0.88, two-sided p = 7 ×
10−8; including XY [inset] Pearson’s
r = 0.99, two-sided p <
10−300). c, Per-sperm-donor aneuploidy rates
(axes as in b) (excluding XY [not shown] Pearson’s
r = 0.51, two-sided p = 0.02; including XY
Pearson’s r = 0.62, two-sided p =
0.003). d, Frequencies of whole-chromosome gains occurring during
MI (x axis) and MII (y axis) for each chromosome (excluding XY Pearson’s
r = 0.32, two-sided p = 0.15; including XY
[inset] Pearson’s r = 0.85, two-sided p
= 3 × 10−7). e, Frequencies of
whole-chromosome gains occurring during MI (x axis) and MII (y axis) for each
donor (axes as in d) (excluding XY [not shown] Pearson’s
r = 0.06, two-sided p = 0.80; including XY
Pearson’s r = 0.17, two-sided p =
0.47). f, Example genomic anomalies detected in sperm cells,
plotted as in (a).
The 20 young (18–38-year-old) sperm donors, considered by
clinical criteria to have normal-range sperm parameters, exhibited aneuploidy
frequencies ranging from 0.010 to 0.046 aneuploidy events per cell (Fig. 3c, Extended Data Table 1). Permutation tests indicated that this
4.5-fold variation in observed aneuploidy rates reflected genuine
inter-individual variation (one-sided p < 0.0001, Supplementary Notes).Under the prevailing model for the origins of aneuploidy, sperm with
chromosome losses and gains should be equally common. However, we observed
2.4-fold more chromosome losses than chromosome gains (554 losses vs. 233 gains,
proportion test two-sided p = 2 ×
10−30). This asymmetry did not appear to reflect technical
ascertainment bias (Extended Data Fig. 9a,
Supplementary
Notes). This surprising result is further considered in the Supplementary
Discussion.
Extended Data Fig. 9.
Relationships of aneuploidy frequency to chromosome size and
recombination.
a. The across-donor per-cell frequency of chromosome
losses (left) and gains (center), plotted against the length of the
chromosome (hg38; for losses across n = 22 chromosomes,
Pearson’s r = −0.29, two-sided
p = 0.19 and for gains across n = 22
chromosomes, Pearson’s r = −0.23, two-sided
p = 0.30). Right, the per-chromosome rate of losses
exceeding gains (number of losses minus number of gains divided by number of
cells) is plotted against the length of the chromosomes (across
n = 22 chromosomes, Pearson’s r
= −0.29, two-sided p = 0.19). Red labels,
acrocentric chromosomes. Error bars, 95% binomial confidence intervals on
per-cell frequency (number of events / number of cells, all 31,228 cells
included). b-d, Relationship between aneuploidy frequency and
recombination. Only autosomal whole-chromosome aneuploidies are included.
b, Left, Total number of crossovers on MI nondisjoined
chromosomes (blue line; chromosomes analyzed, called as transitions between
the presence of one haplotype and both haplotypes on the gained chromosome)
compared to n = 10,000 donor- and chromosome-matched sets
(35 × 2 chromosomes per set) of properly segregated chromosomes (gray
histogram; permutation). (54 total crossovers on MI gains vs. 84.2 mean
total crossovers on sets of matched chromosomes, one-sided permutation
p < 0.0001, for the hypothesis that gained
chromosomes have fewer crossovers). Right, as left but for gains occurring
during MII (71 MII-derived gained chromosomes of one whole copy from all
individuals with fewer than 5 crossovers called on gained chromosome).
(One-sided permutation p = 0.98 for MII from
n = 10,000 permutations, for the hypothesis that gained
chromosomes have fewer crossovers; sister chromatids nondisjoined in MII
capture all crossovers whereas matched chromosomes do not: matched
simulations and homologs nondisjoined in MI capture only a random half of
crossovers occurring on that chromosome in the parent spermatocyte).
c, Crossovers per non-aneuploid megabase from each cell
from each donor, split by aneuploidy status (n cells = 498,
50, 92, 30,609, left-to-right; “euploid” excludes cells with
any autosomal whole- or partial-chromosomal loss or gain and
“gains” includes gains of one or more than one chromosome
copy; Mann–Whitney test W = 7,264,117, 722,191,
1,370,376; two-sided p = 0.07, 0.49, 0.66 for all autosomal
aneuploidies, meiosis I (MI) gains, and meiosis II (MII) gains,
respectively, all compared against euploid). Each cell is one point;
boxplots show medians and interquartile ranges with whiskers extending to
1.5 times the interquartile range from the box. d, Per-cell
crossover rates vs. per-cell aneuploidy (loss and gain) rates,
n = 20 donors (colored by crossover rate).
p values shown in subtitles are for two-sided
Pearson’s correlation tests. Error bars are 95% confidence intervals
on mean crossover rate (x axis) and on observed aneuploidy frequency (y
axis).
Errors in chromosome segregation can occur at meiosis I (MI), when
homologs generally separate, or at meiosis II (MII), when sister chromatids
separate. Because recombination occurs in MI prior to disjunction but does not
occur at centromeres, errors during MI result in chromosomes with different
(homologous) haplotypes at their centromeres, whereas sister chromatids
nondisjoined in MII have the same (sister) haplotype at their centromeres (Fig. 3a). (Sex chromosomes X and Y disjoin in
MI, and the sister chromatids of X and Y disjoin at MII.) Encouragingly, for
chromosome 21 – the principal chromosome for which earlier estimates were
possible – our finding of 33% MI events and 67% MII events matched
previous estimates from trisomy 21 patients with paternal-origin gains[36].Across all chromosomes, MI gains and MII gains had very different
relative frequencies in different individuals and on different chromosomes
(Fig. 3d,e). For example, sex chromosomes were 2.2 times more likely to be
affected in MI than MII, whereas autosomes were 2.0 times more likely to be
affected in MII than MI (proportion test two-sided p = 1.3
× 10−6). The lack of correlation between MI and MII
vulnerabilities (Fig. 3d,e) indicated that MI and MII are differentially
challenging to different chromosomes and to different people.Although crossovers are required for proper chromosomal
segregation[37] and seem
protective against nondisjunction in maternal meiosis, in which chromosomes are
maintained in diplotene of meiosis I for decades[8], the relationship of crossovers to
aneuploidy is less clear in paternal meiosis[24,36,38-41]. We found that chromosome gains originating in MI
– when recombination occurs – had 36% fewer total crossovers than
matched, well-segregated chromosomes did (Supplementary Methods), suggesting
that crossovers protected against MI nondisjunction of the chromosomes on which
they occurred (Extended Data Fig. 9b,
Supplementary
Notes). No similar relationship was observed for MII gains (though the
simulated control distribution for MII is inherently less accurate, Supplementary Notes) or
at other levels of aggregation (Extended Data
Fig. 9b-d, Supplementary Notes).
Other chromosome-scale genomic anomalies
Many sperm had complex patterns of aneuploidy that could not be
explained by the canonical single-chromosome mis-segregation event. We detected
19 gametes that had three, instead of one, copies of entire or nearly entire
chromosomes (2, 15, 20, and 21) (Fig 3f,
Extended Data Fig. 10a,b). Chromosome 15 was particularly likely to be
present in two extra copies; in fact, sperm with three copies of all or most of
chromosome 15 (n = 10) outnumbered sperm with two copies of
chromosome 15 (n = 2) (Fisher’s exact test vs. Poisson
two-sided p = 2 × 10−7, Supplementary Notes).
Extended Data Fig. 10.
Additional examples of non-canonical aneuploidy events detected with
Sperm-seq, including those shown in Fig.
3f.
Copy number, SNPs, haplotypes, and centromeres are plotted as in
Fig. 3a. Donor and cell identity
are noted in the panel subtitles. Coordinates are in hg38. Chromosomes 2,
20, 21 (a) and 15 (b) are sometimes present in 3
copies in an otherwise haploid sperm cell. c, A distinct,
recurring triplication of much of chromosome 15, from ~33 Mb onwards but not
including the proximal part of the q arm, also recurs in
cells from 3 donors. d, Chromosome arm-level losses (top) and
gains (including in more than one copy, bottom three panels, and a compound
gain of the p arm and loss of the q arm,
top panel).
Other gametes carried anomalies encompassing incomplete chromosomes.
These included: one cell that gained the p arm of chromosome 4
while losing the q arm; cells with gains of two copies of a
chromosome arm; and cells with losses of chromosome arms (Fig. 3f; Extended Data
Fig. 10c,d). One cell carried
at least eight copies of most of the q arm of chromosome 4
(Fig. 3f). This gamete – which
we estimate contained almost a billion base pairs of extra DNA – carried
both parental haplotypes of chromosome 4, though almost all of the ~8 copies
came from just one of the parental haplotypes (93% of observed alleles in the
amplified region were haplotype 2). Diverse mutational processes likely generate
these genomic anomalies (Supplementary Discussion).
Discussion
Inter-individual variation in crossover rates has previously been inferred
from SNP data from families[2-7,9-12]. Here,
highly parallel single-gamete sequencing revealed that donors with high crossover
rates also exhibit closer crossover spacing, even when controlling for the number of
crossovers actually made on a chromosome. Based on these analyses, we consider it
most likely that inter-individual variation in crossover interference is the true
driver of variation in crossover rate and placement.These same constellations of correlated meiotic crossover phenotypes
– low interference, high rates, use of centromere-proximal zones –
tended to characterize the same gametes from any donor. Cells with more crossovers
in half of their genome tended to have more crossovers in the other half, tended to
have made consecutive pairs of crossovers closer together in genomic distance
– even when making just two crossovers on a chromosome – and tended to
have placed proportionally more of their crossovers in non-distal chromosomal
regions.What could cause these meiotic phenotypes to covary across chromosomes, in
individual cells, and among people? The physical length of chromosomes during
meiosis, which reflects their compaction, has been observed to vary up to two-fold
among individual spermatocytes while being strongly correlated across chromosomes in
the same spermatocyte; spermatocytes with more-compacted chromosomes also generally
have fewer incipient crossovers[20,21,42]. A unifying model (Extended
Data Fig. 11) explains the covariance of these meiotic phenotypes while
providing a candidate mechanism for inter-individual variation: cell-to-cell
variation in the compaction of meiotic chromosomes – and person-to-person
variation in the average degree of this compaction – would cause these
phenotypes to co-vary in the manner observed in Fig.
2b-e.
Extended Data Fig. 11.
Single-cell and person-to-person variation in diverse meiotic phenotypes
may be governed by variation in the physical compaction of chromosomes
during meiosis.
Previous work shows that the physical length of the same chromosome
varies among spermatocytes at the pachytene stage of meiosis, likely by
differential looping of DNA along the meiotic chromosome axis
(e.g. left column shows smaller loops, resulting in
more loops total and in greater total axis length compared to the right
column with larger loops)[15,72–75]. This physical chromosome
length is correlated across chromosomes among cells from the same
individual[21,76] and correlates with
crossover number[15,20,21,42,73,76]. This length – measured as the length of the
chromosome axis or of the synaptonemal complex (the connector of homologous
chromosomes) – can vary two or more-fold among a human’s
spermatocytes[21].
We propose that the same process differs on average across individuals and
may substantially explain inter-individual variation in recombination rate.
On average, individual 1 (left) would have meiotic chromosomes that are
physically longer (less compacted) in an average cell than individual 2
(right); one example chromosome is shown in the figure. After the first
crossover on a chromosome (likely in a distal region of a chromosome, where
synapsis typically begins in male human meiosis before spreading across the
whole chromosome[13–15]), crossover interference
prevents nearby double-strand breaks (DSBs) from becoming crossovers; DSBs
far away can become crossovers (which themselves also cause interference).
More DSBs are likely created on physically longer chromosomes, and crossover
interference occurs among non-crossover as well as crossover DSBs[77]. Crossover interference
occurs over relatively fixed physical (micron) distances[43–45,76]; these
distances encompass different genomic (Mb) lengths of DNA in different cells
or on average in different people due to variable compaction. Thus,
crossover interference tends to lead to different total number of crossovers
as a function of degree of compaction, resulting in the observed negative
correlation (Fig. 2c,e) of crossover rate with crossover spacing (as
measured in base pairs). Given that the first crossover likely occurs in a
distal region of the chromosome, this model can also explain the negative
correlation (Fig. 2b,d) of crossover rate with the proportion of
crossovers in chromosome ends. Note: this figure shows the total number of
crossovers, crossover interference extent, and crossover locations for both
sister chromatids of each homolog combined; in reality, these crossovers are
distributed among the sister chromatids, making these relationships harder
to detect in daughter sperm cells and requiring large numbers of
observations to make relationships among these phenotypes clear.
Our enthusiasm about this model relies on multiple additional earlier
observations (Extended Data Fig. 11). Firstly,
at a cellular level, crossover interference occurs as a function of physical
(micron) distance along the meiotic chromosome axis or synaptonemal complex rather
than as a function of genomic (base pair) distance[43-45]. Secondly, the first crossover on a chromosome is more likely
to occur distally[13-15]. Such a model also predicts a
shared mechanism for sex differences in recombination rates and inter-individual
variation among individuals of the same sex: oocytes have a longer synaptonemal
complex, more crossovers, and decreased crossover interference as measured in
genomic distances than spermatocytes, but have the same synaptonemal complex length
extent of crossover interference[22,42,46,47].Human genetics research has revealed that recombination phenotypes are
heritable and associate with common variants at many genomic loci[3-6,11,12]. A recent genome-wide association study
found that variation in crossover rate and placement is associated with variants
near genes that encode components of the synaptonemal complex, which connects and
compacts meiotic chromosomes, and with genes involved in the looping of homologs
along the chromosome axis[3]. Our
model predicts that inherited genetic variation at these loci may bias the average
degree of compaction of meiotic chromosomes; the fact that this same property varies
among cells from the same donor[20,21] shows that variance is
well-tolerated and compatible with diverse-but-successful meiotic outcomes.The sharing of co-varying phenotypes between the single-cell and
person-to-person levels suggests that a core biological mechanism shapes both inter-
and intra-individual (single-cell) variation in meiotic outcomes. Such parallelisms
between cell-biological and human-biological variation could in principle exist in a
wide variety of biological contexts.
Methods
A companion protocol for generating single-sperm libraries using the methods
presented here is available via Protocol Exchange[48]. Custom scripts (available via
Zenodo[49]) are referenced
by name in the sections describing analyses they perform. Recombination and
aneuploidy data generated via the described methods are also publicly
available[50]. All
statistical analyses were performed in R unless otherwise noted. Details on further
analysis methods are provided in the Supplementary Methods.
Sample information
Sperm samples from 20 anonymous, karyotypically normal sperm donors were
obtained from New England Cryogenic Center under a Not Human Subjects
determination from the Harvard Faculty of Medicine Office of Human Research
Administration (protocols M23743–101 and IRB16–0834). Donors
consented at the time of initial donation for samples to be used for research
purposes. The Not Human Subjects determination was based on the use of discarded
biospecimens that had been consented for research and the fact that researchers
had no interactions with the biospecimen donors and no access to identifiable
information about the biospecimens. The reviewing committee also reviewed and
approved our deposition of the data into an NIH repository. All experiments were
performed in accordance with all relevant guidelines and regulations. (Specimens
can be obtained from New England Cryogenic Center upon IRB approval.)Samples arrived in liquid nitrogen in “egg yolk buffer” or
“standard buffer with glycerol” (no further buffer information
provided) and were aliquoted and stored in liquid nitrogen in the same
buffers.Per sperm bank policy, donors are 18–38 years old at the time of
donation and precise age of donors is not released. Donor identifiers used in
the paper were created specifically for this study and are not linked to any
external identifiers.
ddPCR to evaluate genome accessibility
To evaluate how often regions from two different chromosomes co-occurred
(as would be expected from cells), we performed droplet digital PCR with naked
DNA, untreated sperm cells, or sperm cells decondensed as described subsequently
but with variable heat incubation times. For each assay targeting each
chromosome, a 20× assay mix was created by combining 25.2 μL of
100 μM forward primer (IDT), 25.2 μL of 100 μM reverse
primer (IDT), and 7 μL of 100 μM probe (IDT for FAM-labelled
probes, Life Technologies for VIC-labelled probes) with 82.6 μL ultrapure
water. ddPCR was performed as described previously[51], following section 3.2 steps
4–12, but with untreated sperm or sperm DNA florets as input instead of
DNA.For this analysis, chromosome 7 was targeted with an assay to intergenic
region chr7: 106552149–106552176 (hg38); forward primer sequence:
CGTAATGGGGCACAGGGATATA; reverse primer sequence: CTGTGAGAGGTAGAGAATCGCC; probe
sequence: CACAGAGTCCATTTGCAGCACCTCAGT; probe fluorophore: FAM. Chromosome 10 was
targeted with an assay to RPP30 at
chr10:92631759–92631820; forward primer sequence: GATTTGGACCTGCGAGCG;
reverse primer sequence: GCGGCTGTCTCCACAAGT; probe sequence: CTGACCTGAAGGCTCT;
probe fluorophore: VIC.We calculated the percentage of molecules expected to be linked from
each reaction following Regan et al.[52].
Sperm cell library generation
Accessible sperm nuclei “florets” were generated using a
combination of published decondensation protocols[53,54] with some modifications. Sperm aliquots containing
>200,000 cells were thawed on ice and then washed by spinning for 10
minutes at 400 g at 4°C. The pellet was resuspended in 10 μL
phosphate-buffered saline (PBS, Gibco/LifeTechnologies) and re-centrifuged under
the same conditions. The sperm pellet was resuspended in 2.5 μL of a
sucrose buffer containing 250 mM sucrose (Sigma), 5 mM MgCl2 (Sigma),
and 10 mM Tris HCl (pH 7.5, Thermo Scientific). Sperm aliquots were submerged in
liquid nitrogen and immediately quick-thawed by holding them in a warm fist;
three such freeze-thaw cycles were performed.Freeze-thawed sperm solution was combined with 22.5 μL
decondensation buffer (113 mM KCl [Sigma], 12.5 mM KH2PO4
[Sigma], 2.5 mM Na2HPO4 [Sigma], 2.5 mM MgCl2
[Sigma], and 20 mM Tris [Thermo Scientific] freshly supplemented with 150
μM heparin [sodium salt from porcine, Sigma H3393] and 2 mM
beta-mercaptoethanol [Sigma]). The reaction was incubated at 37°C for 45
minutes. To allow enzymatic DNA amplification, heparin was inactivated by mixing
the sperm solution with 0.5 U heparinase I (Sigma H2519) by gently pipetting and
incubating at room temperature for 2 hours[55].The sperm solution was moved to ice, and sperm floret concentration was
determined by diluting 1:100 with PBS and staining with 1X SYBR I (Thermo
Scientific) and counting using the green fluorescence channel at 10x
magnification.Droplets were prepared using the following modifications to 10X
Genomics’ GemCode (version 1[29]) User Guide Revision C (in place of steps
5.1–5.3.9); all reagents come from the 10X Genomics GemCode kit.
Ultrapure water was combined with 10,833 sperm to a final volume of 5 μL;
10,000 sperm were used for library generation. To each sperm sample was added 60
μL of a master mix containing 32.5 μL GemCode reagent mix, 1.5
μL primer release agent, 9.2 μL GemCode polymerase, and 16.8
μL ultrapure water.GemCode beads were vortexed at full speed for 25 seconds, and then
diluted 1:11 with ultrapure water to a total volume of at least 90 μL per
sample. Per 10X’s protocol, 60 μL of sample-master mix combination
was added to the droplet generation chip, followed by 85 μL of freshly
pipette-mixed 1:11-diluted bead mixture and 150 μL of droplet generation
oil.Droplets were generated and processed through library generation
following 10X Genomics’ GemCode (version 1) User Guide Revision C (step
5.3.10 through the end of section 6).
Sequencing and sequence data processing
Two libraries were generated per sperm donor and additional libraries
were generated for four initial samples with low cell counts.Four or five
libraries were sequenced at a time on S2 200 cycle flow cells on an Illumina
NovaSeq. The read structure was 178 cycles read 1, 8 cycles read 2 (index read
one), 14 cycles read 3 (index read two containing the cell barcode; later
treated as the reverse read), and 5 cycles read 4 (unused; included to fulfill
the NovaSeq’s paired-end requirement).To convert the data to mapped BAM files with cell and molecular barcodes
encoded as read tags, we used Picard Tools v2.2 (http://broadinstitute.github.io/picard) and Drop-seq Tools v2.2
(https://github.com/broadinstitute/Drop-seq/releases; see
https://github.com/broadinstitute/Drop-seq/blob/master/doc/Drop-seq_Alignment_Cookbook.pdf
for details on running many of the tools)[28]:Illumina BCL files were converted to unmapped BAM files using
Picard’s ExtractIlluminaBarcodes and IlluminaBasecallsToSam with read
structure 178T8B14T (cell barcodes, present in the i5 index, were incorporated
as read 2 for ease of downstream processing). BAMs were processed to include
unique molecular identifiers (UMIs) and cell barcodes as read tags, and to
exclude reads with poor-quality cell barcodes or UMIs; consequently, each read
was retained as single-end with 14-bp cell barcode stored in tag XC and 10-bp
molecular barcode/unique molecular identifier (UMI) stored in tag XM. The first
10 bp of read 1 were used as the UMI. First, DropSeq Tools’
TagBamWithReadSequenceExtended was called with BASE_RANGE=1–14,
BASE_QUALITY=10, BARCODED_READ=2, DISCARD_READ=true, TAG_NAME=XC,
NUM_BASES_BELOW_QUALITY=1. Subsequently, TagBamWithReadSequenceExtended was
called again with BASE_RANGE=1–10, BASE_QUALITY=10, HARD_CLIP_BASES=true,
BARCODED_READ=1, DISCARD_READ=false, TAG_NAME=XM, NUM_BASES_BELOW_QUALITY=1.
Finally, DropSeq Tools’ FilterBAM was called with parameter
TAG_REJECT=XQ.Reads were aligned to hg38 using bwa mem[56] v0.7.7-r441. BAMs were converted to
FastQ using Picard’s SamToFastQ, FastQ reads were aligned using bwa mem
-M, and then unmapped BAMs were merged with mapped BAMs using Picard’s
MergeBamAlignment, with non-default options INCLUDE_SECONDARY_ALIGNMENTS=false
and PAIRED_RUN=false. Reads were marked PCR duplicates using Drop-seq
Tools’ SpermSeqMarkDuplicates (part of Drop-seq tools v2.2 and above)
with options STRATEGY=READ_POSITION, CELL_BARCODE_TAG=XC,
MOLECULAR_BARCODE_TAG=XM, NUM_BARCODES=20000, CREATE_INDEX=true. BAM files for
all lanes and index sequences from the same sample were merged using
Picard’s MergeSamFiles prior to alignment and/or during duplicate marking
with all BAMs given as input to SpermSeqMarkDuplicates.
Variant calling, sperm cell genotyping
For each donor, we pooled all reads from all libraries, including reads
that did not derive from a barcode associated with a complete sperm cell. Using
GATK v3.7[57,58] in hg38, we followed GATK’s best
practices documentation for base quality score recalibration, gVCF generation
using HaplotypeCaller (in DISCOVERY mode with -stand_call_conf 20), and joint
genotyping with GenotypeGVCFs. We filtered variants with SelectVariants
-selectType SNP and VariantFiltration (--filterExpression
“QD<3.0”). We then performed VQSR following GATK’s
best practices, except that we excluded annotations MQ and DP
(VariantRecalibrator with GATK provided resources; -an QD, MQRankSum,
ReadPosRankSum, FS, and SOR; -mode SNP; --trustAllPolymorphic; and tranches 90,
99.0, 99.5, 99.9, and 100.0). We applied tranche 99.9 recalibration using
ApplyRecalibration -mode SNP and obtained the names of SNPs from dbSNP
146[59] using
VariantAnnotator --dbsnp. We filtered our sites to contain only biallelic SNPs
present in Hardy–Weinberg equilibrium in 1000 Genomes Phase 3[60] using SelectVariants
--concordance with a VCF containing only these sites (from GATK’s
resource bundle). We excluded SNPs in centromeric regions or acrocentric arms as
defined by the UCSC Genome Browser’s cytoband track[61,62] (http://genome.ucsc.edu; the same centromere boundaries were used
in all analyses) and those in known paralogous regions as lifted over from
Genovese et al 2014[63]. We
selected only heterozygous SNPs using SelectVariants -selectType SNP
--selectTypeToExclude INDEL --restrictAllelesTo BIALLELIC --excludeFiltered
--setFilteredGtToNocall --selectexpressions
‘vc.getGenotype(“‘““‘“).isHet()’.We identified SNPs present in each sperm cell and which allele was
present using GenotypeSperm (part of Drop-seq Tools v2.2 and above). For
downstream analyses, we generated a file with columns cell, pos, and gt, with gt
having the value 0 for the reference allele and 1 for the alternate allele for
SNPs that had one or more UMIs covering only one base matching the reference or
alternate allele. (See our script gtypesperm2cellsbyrow.R.)
Chromosome-scale phasing
We identified barcodes potentially associated with cells by plotting the
cumulative fraction of reads associated with each ranked barcode and identifying
the inflection point of this curve (see Extended
Data Fig. 1f). We then included only barcodes with substantial read
depth on either the X or the Y chromosome but not both, as the vast majority of
sperm cells should contain only one sex chromosome. (We later added these
barcodes back in before formally identifying and excluding cell doublets).To phase sperm donors’ genomes, we used all quality-controlled
heterozygous sites in these cell barcodes expected to correspond to sperm cells,
excluding observations of SNPs where the observed allele was not the reference
or alternate allele in the parental genome or where more than one allele was
observed. For each chromosome, we converted per-cell SNP calls into
“fragments” for input into the HapCUT phasing software[64,65] by considering each consecutive pair of SNPs observed
in a cell to be a fragment (see our script gtypesperm2fmf.R).
We then used HapCUT with parameter –maxiter 100 to generate chromosomal
phase. After identifying and removing cell doublets (see below), we repeated
phasing with only non-doublet cell barcodes.To validate our phasing method, we simulated single-cell SNP
observations from known haplotypes, including 2% genotype errors and a variable
percentage of cell doublets. Briefly, sites were randomly sampled from one known
haplotype of chromosome 17 until a crossover location probabilistically assigned
based on the deCODE recombination map[6], then sampled from the other haplotype (one crossover
was simulated per cell). To simulate PCR or sequencing errors, 2% of the sites
were randomly assigned to an allele. Doublets were simulated by combining two
cells and retaining 70% of the observed sites at random. We performed five
random simulations for each doublet proportion, mean proportion of sites
“observed” in each cell, and number of cells simulated, and then
followed our phasing protocol using each simulation. (See our script
simulatespermseqfromhaps.py.)To further validate phasing, we used Sperm-seq data to phase one
donor’s genome and compared these phased haplotypes to this
donor’s Eagle[66,67]-generated haplotypes. We
compared the phase relationship between each consecutive pair of SNPs
(identifying the proportion of switch errors between the two phased sets). We
also compared the Sperm-seq allele-allele phase of all pairs of alleles in
perfect linkage disequilibrium in 1000 Genomes Phase 3[60] in the populations matching the
donor’s ancestry.
Cell doublets
To identify cell barcodes associated with more than one sperm cell (cell
doublets), we detected consecutively observed SNP alleles that appeared on
different parental haplotypes, which could occur because of crossover, error, or
the presence of two haplotypes in the same droplet (doublet). We ranked barcodes
by the proportion of consecutive SNPs that spanned haplotypes using all SNPs
from all autosomes except the autosome with the most haplotype-spanning
consecutive SNPs (so as to avoid mistakenly identifying cells with chromosome
gains as doublets); this resulted in a clear inflection point wherein cell
doublets had a quickly accelerating proportion of haplotype-spanning consecutive
SNPs (Extended Data Fig. 2d-f). All cell barcodes below this inflection point
(identified with the function ede from the R package
inflection
https://CRAN.R-project.org/package=inflection) were considered
non-doublet (Extended Data Fig. 2f). (See
our script computeSwitchesandInflThresh.R.) Even though we
exclude the autosome with the most haplotype-spanning consecutive SNPs from
doublet identification, any cells with multiple chromosome gains (especially
more than two) or whole-genome diploidy would be excluded by this method.
Crossover events
We identified crossover events on all autosomes (but excluded the
p arms of acrocentric chromosomes where SNPs were excluded
from analysis) by finding transitions between tracts of SNPs with alleles
matching different parental haplotypes using a Hidden Markov Model written in R
with package HMM (https://CRAN.R-project.org/package=HMM). To ensure that we
detected crossovers located near the ends of SNP coverage (sub-telomeric regions
are frequently used for crossovers in spermatogenesis), we ran the HMM both in
the forward chromosomal and reverse-chromosomal directions, with start
probability for one haplotype equal to 1 if the first two SNPs observed were of
that haplotype. In addition to two states for parental haplotypes, we included a
third “error” state to capture cases in which a haplotype 1 allele
is observed in a haplotype 2 region (and vice versa), e.g., due
to PCR or sequencing error, gene conversion, or cases in which a small piece of
off-haplotype ambient DNA was captured in a droplet. Crossovers were where one
haplotype transitioned to another, or where one haplotype transitioned to the
error state and then to the other haplotype. Crossover boundaries were the last
SNP in the first haplotype and the first in the next. The key parameters for
this algorithm are the transition probability between haplotypes (set to 0.001,
from the per-cell median 26 crossovers divided by the per-cell median 24,710
heterozygous SNPs) and transition probability into and out of the
“error” state (we set transition probability into this state to
0.03 from either haplotype, as only a few percent of SNPs are off-haplotype; we
set the probability of staying in error to 0.9 to allow for the occasional tract
of SNPs from an ambient piece of off-haplotype DNA). Emission probabilities were
100% haplotype 1 alleles from haplotype 1, 100% haplotype 2 alleles from
haplotype 2, and equal probability haplotype 1 or 2 alleles from the third
“error” state. Crossover calling was robust to a range of low
transition probabilities. (See our script
spseqHMMCOCaller_3state.R, which calls crossovers on one
chromosome.)After aneuploidy identification, we marked aneuploid chromosomes as
having no crossovers for all crossover analyses (absent chromosomes have no
crossovers and crossovers are called differently on gained chromosomes,
described subsequently).
Identifying even-coverage cell barcodes
We used Genome STRiP v2.0 (GS) (http://software.broadinstitute.org/software/genomestrip/)
[68,69] to determine sequence read depth
(observed number of reads divided by expected number of reads) in bins of 100 kb
of uniquely mappable sequence across the genome in each sperm cell, using
GS’s default GC bias correction and repetitive region masking for gr38.
We divided read depth by 2 to obtain read depth per haploid rather than diploid
genome. Input to GS was a BAM file containing only cells of interest with read
groups set to : (created
using Drop-seq Tools’ ConvertTagToReadGroup with options
CELL_BARCODE_TAG=XC, SAMPLE_NAME=,
CREATE_INDEX=true, and CELL_BC_FILE=list of barcodes potentially associated with
cells, described above).A minority of cell barcodes were associated with eccentric read depth
across many chromosomes, with wave-like read depth vacillating between 0 and
≥2. (We hypothesize that these cell barcodes were associated with sperm
nuclei that did not properly decondense, such that some regions of the genome
were more accessible than others, leading to undulating read depth across more
and less accessible chromatin.) To identify and exclude such barcodes, we
treated read depths across each chromosome as a time series and used Box-Jenkins
Autoregressive Integrated Moving Average (ARIMA) modelling to model how read
depth observations relied on their previous values and their overall averages
(implemented via the R package forecast[70,71], excluding differencing). By visual inspection, we
determined that chromosomes with certain ARIMA criteria were likely to have
undulating read depth, and that cell barcodes with five or more such identified
chromosomes were likely to have eccentric read depth globally. We flagged
individual chromosomes if 1) the sum of AR1 and AR2 coefficients was greater
than 0.7, the AR1 coefficient was greater than 0.9, or the net sum of all AR and
MA coefficients was greater than 1.25 and 2) either the net sum of AR and MA
coefficients was greater than 0.4 or the intercept was less than 0.8 or greater
than 1.2. If both criteria in (2) were met, this signified an exceedingly odd
chromosome, which we counted twice. Cell barcodes with five or more chromosomes
flagged in this way were excluded from downstream analyses. (Because gains of
large amounts of the genome cause artificially depressed read depths on
non-gained chromosomes, we manually examined any cells with a large range of
ARIMA intercepts and over five chromosomes denoted as unstable. Any such cells
that had simply gained a large proportion of the genome, e.g.,
3 copies of chromosome 2, were included rather than excluded.) We
cross-referenced all cell exclusions with called aneuploidies, confirming that
cells were not excluded simply on the basis of having lost or gained a
chromosome.(See our scripts setupgsreaddepth.R,
exclbadreaddepth_arima_1.R, exclbadreaddepth_initid_2.R,
and exclbadreaddepth_finalize_3.R)
Replicate barcodes (“bead doublets”)
One sperm cell can be encapsulated in a droplet with more than one
barcoded bead. To identify such cases, where pairs of sperm genomes were
identical, we determined the proportion of SNPs that were of the same haplotype
for each pair of barcodes. We imputed the haplotype of all heterozygous SNPs
based on the haplotype of surrounding observed SNPs and locations of
recombination events and compared SNP haplotypes across sperm cell pairs. SNP
observations between boundaries of crossovers were excluded from analysis. Sperm
cells shared on average 50% of their genomes, but a few sets of barcodes shared
nearly 100% of their SNP haplotypes (Extended Data
Fig. 3a). We considered these pairs “bead doublets” or
replicate barcodes. In all downstream analyses, only one barcode (chosen
randomly) from a set corresponding to the same cell was used. (See our scripts
imputeHaplotypeAllSNPs.R, compareSpermHapsPropSNPs.R,
combineChrsSpermHapsPropSNPs.R, and
curateNonRepBCList.R)
Crossover zones
To define regions of recombination use, we found local minima of the
density (built-in function in R) of all crossovers’ median positions
across all samples on each chromosome. Minima were identified using the
findPeaks function (from https://github.com/stas-g/findPeaks) on the inverse density with
m=3. Crossover zones run from the beginning of the chromosome (including the
whole p arm for acrocentric chromosomes) to the location of the
first local minimum, from the location of the first local minimum plus one
basepair to the next local minimum, etc., with the last zone on
each chromosome ending at the chromosome end. (See our script
findcozones_peaks.R.)
Aneuploidy and chromosome arm loss/gain
As described previously (see “Restricting to cell barcodes with
coverage of the entire genome”), we used Genome STRiP (http://software.broadinstitute.org/software/genomestrip/)[68,69] to determine read depth in each sperm cell in 100 kb
bins. We located chromosomes or chromosome arms with aberrant read depth to
identify aneuploidy.We excluded genomic regions that had outlying read depth across all
cells, defined as those with p < 0.05 in a one-sided
one-sample t-test (looking for increased read depth) against
the expected mean read depth of 2# (defined below). To identify gains of
autosomes, we performed a one-sided one-sample t-test
(expecting increased read depth in a gain) for each cell against expected read
depth for a gain of one copy, 2#. For each cell, this analysis compared the
distribution all bins’ read depth across a region of interest to the gain
expectation 2#, and flagged any cells whose read depth distributions were not
significantly different (p ≥ 0.05) We used the same
approach to identify losses, comparing a cell’s read depth distribution
across bins to 0.1 and flagging any that were not significantly higher
(p ≥ 0.05).The expected copy number for gains is 2, but the expected read depth for
gains depends on the size of the chromosome: a library corresponding to a cell
with a chromosome gain has more reads than would be in that same library without
a gain. This phenomenon pulls read depth down globally by increasing the total
number of expected reads, causing the denominator in each read depth bin (the
expected number of reads in that bin) to increase. Therefore, we computed a
chromosome-specific critical read depth value for identifying gains: 2# = 2*(the
proportion of the genome in base pairs coming from all chromosomes other than
the tested one). For losses, we used 0.1 rather than 0 as the expected read
depth because a small number of reads generally align to every chromosome in
every library.For non-acrocentric chromosomes, we performed aneuploidy calling for the
arms separately and for the whole chromosome. Because amplification of more than
two copies of a chromosome arm could result in the whole chromosome passing the
p-value threshold, we required a whole-chromosome event to
pass the p-value threshold at the whole-chromosome level and to
have rounded read depth of both arms ≥ 2 for a gain (or 0 for a loss).
For the acrocentric chromosomes, only the q arm was considered
and any q arm gain or loss was considered to be a
whole-chromosome event (unless investigated further).For the sex chromosomes, we followed a similar statistical framework,
but a loss was only considered an aneuploidy if both the X and the Y chromosomes
were flagged as lost. A gain was called if both the X and Y chromosomes were
present. (See our scripts setupgsreaddepth.R, idaneus_initialttests.R,
curateaneudata_clean.R, getautosomalaneumatrix.R, and
getxykaryos_aneus.R for aneuploidy calling and output
formatting; see our scripts curateAnFreqFromCodeMatrix.R,
curateInitAnalyzeXYKaryos.R, and
combineAnFreq_AutXY.R for conversion of outputs of
aneuploidy calling to cross-donor aneuploidy frequency tables.)
Chromosome gains’ division of origin
To see when chromosome gains originated, we determined whether the
centromeres of the multiple copies of the chromosomes were heterozygous and
therefore from homologs, which typically disjoin in meiosis I (MI), or
homozygous and therefore from sister chromatids, which typically disjoin in
meiosis II (MII). We identified heterozygous regions for all cells using a
Hidden Markov Model (HMM) in which the states are 1) heterozygous (emitting
either haplotype’s alleles) or 2) homozygous (emitting only one
haplotype’s alleles), with transition probability between the states
equal to the recombination transition probability. For each gain, we determined
whether heterozygous tracts overlapped the centromere. If a heterozygous tract
1) started before the start of the centromere and ended after the end of the
centromere or 2) started at the first SNP observed on an acrocentric chromosome
or within the first 10 SNPs and was more than 10 SNPs long, the chromosome was
classified as an MI gain; if no heterozygous tract overlapped the centromere, it
was classified as an MII gain. (See our scripts getDiploidTracts_hmm.R,
originOfGainID.R, and
curateOriginMultSamps.R.)At the sex chromosomes, any XY sex chromosome gain derives from MI (X
and Y are homologs), whereas an XX or YY gain derives from MII (sister
chromatids duplicated).
Characterization of egg-mimic sperm preparation and optimization of
bead-based single-sperm sequencing.
a-c, Two-channel fluorescence plots showing the results
of droplet digital PCR (ddPCR) with input template noted in each title,
demonstrating that two loci (from different chromosomes) are detectable in
the same droplet far more often when sperm DNA florets (rather than purified
DNA) are used as input. Each point represents one droplet. Gray points in
the bottom left quadrant represent droplets in which neither template
molecule was detected; blue points in the top left quadrant represent
droplets in which the assay detected a template molecule for the locus on
chromosome 7; green droplets in the bottom right quadrant represent droplets
in which the assay detected a template molecule for the locus on chromosome
10; and brown point in the top right quadrant represent droplets in which
both loci were detected. With a high concentration of purified DNA as input
(a), comparatively fewer droplets contain both loci than
when untreated (b) or treated (c) sperm were used
as input. Sperm “florets” treated with the egg-mimicking
decondensation protocol had a much higher fraction of droplets containing
both loci than purified DNA (compare a and c,
right, high-input treated sperm) and had more-sensitive ascertainment and
cleaner results (quadrant separation) than untreated sperm (compare
b and c, left, low-input sperm and treated
sperm). The pink lines in (b) delineate the boundaries between
droplets categorized as negative or positive for each assay. d,
Optimization of sperm preparation: Characterization of the effect of
different lengths of 37°C incubation of sperm cells treated with
egg-mimicking decondensation reagents on how often the loci on chromosomes 7
and 10 were detected in the same ddPCR droplet. Y axis, the percentage of
molecules calculated to be linked to each other (i.e.
physically linked in input) for assays targeting chromosomes 7 and 10.
Extracted DNA (a negative control) gives the expected result of random
assortment of the two template molecules into droplets (first bar). The
45-minute heat treatment was used for all subsequent experiments in this
study. e and f, Distribution of sequence reads
across cell barcodes from droplet-based single-sperm sequencing. Each panel
shows the cumulative fraction (y-axis) of all reads from a sequencing run
coming from each read-number-ranked cell barcode; a sharp inflection point
delineates the barcodes with many reads from those with few reads. Points to
the left of the inflection point are the cell barcodes that associated with
many reads (i.e., beads that co-encapsulated with cells); the height of the
inflection point reflects the proportion of the sequence reads that come
from these barcodes. Only reads that mapped to the human genome (hg38) and
were not PCR duplicates are included. e, Data from an initial
adaptation of 10X Genomics’ GemCode linked reads system[29] where a small proportion
of the reads come from cell barcodes associated with putative cells.
f, Data from the final, implemented adaptation of 10X
Genomics’ GemCode linked reads system[29] for the same number of input sperm
nuclei as in e. Note that this x-axis includes five times fewer
barcodes than in (e).
Evaluation of chromosomal phasing and identification of cell
doublets.
a, Phasing strategy. Green and purple denote the
chromosomal phase of each allele (unknown before analysis). Each sperm cell
carries one parental haplotype (green or purple) except where a
recombination event separates consecutively observed SNPs (red
“X” in bottom sperm). Because alleles from the same haplotype
will tend to be observed in the same sperm cells, the haplotype arrangement
of the alleles can be assembled at whole-chromosome scale. b,
Evaluation of our phasing method using 1,000 simulated single-sperm genomes
(generated from two a priori known parental haplotypes and
sampled at various levels of coverage). Since cell doublets (which combine
two haploid genomes and potentially two haplotypes at any region) can in
principle undermine phasing inference, we included cell doublets in the
simulation (in proportions shown on the X axis, which bracket the observed
doublet rates). Each point shows the proportion of SNPs phased concordantly
with the correct (a priori known) haplotypes (Y axis) for
one simulation (five simulations were performed per proportion of cell
doublets-percentage of observed sites condition pair). c,
Relationship of phasing capability to number of cells analyzed. Data are as
in (b), but for different numbers of simulated cells. All
simulations had an among-cell mean of 1% of heterozygous sites observed.
d, A cell doublet: when two cells (here, sperm DNA florets)
are co-encapsulated in the same droplet, their genomic sequences will be
tagged with the same barcode; such events must be recognized computationally
and excluded from downstream analyses. e, Four example
chromosomes from a cell barcode associated with two sperm cells (a cell
doublet). Black lines: haplotypes; blue circles: observations of alleles,
shown on the haplotype from which they derive. Both parental haplotypes are
present across regions of chromosomes where the cells inherited different
haplotypes. f, Computational recognition of cell doublets in
Sperm-seq data (from an individual sperm donor, NC11). The proportion of
consecutively observed SNP alleles derived from different parental
haplotypes is used to identify cell doublets; this proportion is generally
small (arising from sparse crossovers, PCR/sequencing errors, and/or ambient
DNA) but is much higher when the analyzed sequence comes from a mixture of
two distinct haploid genomes. We use 21 of the 22 autosomes to calculate
this proportion, excluding the autosome with the highest such proportion
given the possibility that a chromosome is aneuploid. The dashed gray line
marks the inflection point beyond which sperm genomes are flagged as
potential doublets and excluded from downstream analysis. Red points
indicate barcodes with coverage of both the X and Y chromosome (potentially
X+Y cell doublets or XY aneuploid cells); black points indicate barcodes
with one sex chromosome detected (X or Y). The red (XY) cells below the
doublet threshold are XY aneuploid but appear to have just one copy of each
autosome.
Identification and use of “bead doublets.”
a, SNP alleles were inferred genome-wide (for each
sperm genome) by imputation from (i) the subset of alleles detected in each
cell and (ii) Sperm-seq-inferred parental haplotypes. For each pair of sperm
genomes (cell barcodes), the proportion of all SNPs at which they shared the
same imputed allele was estimated. A small but surprising number of such
pairwise comparisons (19 of 984,906 from the donor shown, NC14) indicate
essentially identical genomes (ascertained through different SNPs).
b, We hypothesize that this arises from a heretofore
undescribed scenario we call “bead doublets”, in which two
barcoded beads have co-encapsulated with the same gamete and whose barcodes
therefore tagged the same haploid genome. c, Random pairs of
cell barcodes (here 100 pairs selected from donor NC10) tend to interrogate
few of the same SNPs (left), and tend to detect the same parental haplotype
on average at the expected 50% of the genome (right). d,
“Bead doublet” barcode pairs (here 20 pairs from
donor NC10, who had the median number of bead doublets, left) also
interrogate few of the same SNPs, yet detect identical haplotypes throughout
the genome (right). Results were consistent across donors. e,
Use of “bead doublets” to characterize the concordance of
crossover inferences between distinct samplings of the same haploid genome
by different barcodes. The bead doublets (barcode pairs) were compared to
100 random barcode pairs per donor. Crossover inferences were classified as
“concordant” (overlapping, detected in both barcodes), as
“one SNP apart” (separated by just one SNP, detected in both
barcodes), as “near end of coverage” (within 15 heterozygous
SNPs of the end of SNP coverage at a telomere, where power to infer
crossovers is partial), or as discordant. Error bars (with small magnitude)
show binomial 95% confidence intervals for the number of crossovers per
category divided by number of crossovers total in both barcodes (32,714
crossovers total in 1,201 bead doublet pairs; 67,862 crossovers total in
2,000 random barcode pairs; some barcodes are in multiple bead doublet or
random barcode pairs).
Numbers and locations of crossovers called from down-sampled data (equal
number of SNPs in each cell, randomly chosen).
To eliminate any potential effect of unequal sequence coverage
across donors and cells, down-sampling was used to create data sets with
equal coverage (numbers) of heterozygous SNP observations in each cell.
Crossovers were called from these random equally sized sets of SNPs from all
cells. a and b, Crossover number per cell globally
(a) and per chromosome (b) (785,476 total
autosomal crossovers called from down-sampled SNPs included, 30,778 cells
included, aneuploid chromosomes excluded). c, Density plots of
crossover location with crossover midpoints plotted and area scaled to be
equal to per-chromosome crossover rate. Gray rectangles mark centromeric
regions; coordinates are in hg38. d, Similar numbers of
crossovers were called from full data and equally down-sampled SNP data: we
performed correlation tests across cells for each donor and chromosome to
compare the number of crossovers called from all data to the number of
crossovers called from equal numbers of randomly down-sampled SNPs. The
histogram shows Pearson’s r values for all 460 (20
donors x 23 chromosomes [total number plus number for 22 autosomes]) tests
(n per test = 974–2,274 cells per donor as in
Extended Data Table 1, all
chromosome comparisons Pearson’s r > 0.83,
all two-sided p < 10−300).
E, Crossovers called from equally down-sampled SNP data
were in similar locations to those called from all data: we performed
correlation tests comparing crossover rate in 500 kb bins (cM/500 kb) from
all data vs. equally down-sampled SNP data for each donor and chromosome.
The histogram shows Pearson’s r values for all 460
(20 donors x 23 chromosomes [genome-wide rate plus rate for 22 autosomes])
tests (n per test = number of 500 kb bins per chromosome
[genome-wide: 5,739, chromosomes 1 through 22: 497, 484, 396, 380, 363, 341,
318, 290, 276, 267, 270, 266, 228, 214, 203, 180, 166, 160, 117, 128, 93,
101], all chromosome comparisons Pearson’s r
> 0.87, all two-sided p <
10−300 ).
Inter-individual and inter-cell recombination rate from single-sperm
sequencing.
a, Density plot showing per-cell number of autosomal
crossovers for all 31,228 cells (813,122 total autosomal crossovers) from 20
sperm donors (per-donor cell and crossover numbers as in Extended Data Table 1; aneuploid chromosomes
were excluded from crossover analysis). Colors represent a donor’s
mean crossover rate (crossovers per cell) from low (blue) to high (red).
This same mean recombination rate-derived color scheme is used for donors in
all figures. Recombination rate differs among donors (n =
20, Kruskal–Wallis chi-squared = 3,665, df =19,
p < 10−300). b,
Per-chromosome crossover number in each of the 20 sperm donors (data as in
(a) but shown for individual chromosomes). c,
Per-chromosome genetic map lengths for: (i) each of the 20 sperm donors, as
inferred from Sperm-seq data (colors from blue to red reflect donors’
individual crossover rates as described above); (ii) a male average, as
estimated from pedigrees by deCODE[6] (yellow triangles); (iii) a population average
(including female meioses, which have more crossovers), as estimated from
HapMap data[7] (yellow
circles). The deCODE genetic maps stop 2.5 Mb from the ends of SNP coverage.
d, Physical vs. genetic distances (for individualized sperm
donor genetic maps and deCODE’s paternal genetic map) plotted at 500
kb intervals (hg38). Gray boxes denote centromeric regions (or centromeres
and acrocentric arms). Sperm-seq maps are broadly concordant with deCODE
maps (correlation test results in Supplementary Notes) except at
subtelomeric regions not included in deCODE’s map.
Distributions of crossover locations along chromosomes (in
“crossover zones”).
a, Each donor’s crossover locations are plotted
as a colored line; color indicates the donor’s overall crossover rate
(blue: low, red: high); gray boxes show the locations of centromeres (or,
for acrocentric chromosomes, centromeres and p arms). The
midpoint between the SNPs bounding each inferred crossover was used as the
position for each crossover in all analyses. To combine data across
chromosomes, crossover locations (density plot) are shown on
“meta-chromosomes” in which crossover locations are normalized
to the length of the chromosome or arm on which they occurred. For
acrocentric chromosomes, only the q arm was considered; for
non-acrocentric chromosomes, the p and q
arms were afforded space based on the proportion of the non-acrocentric
genome (in bp) they comprise, with the centromere placed at the summed
p arms’ proportion of bp of these chromosomes.
Crossover locations were first converted to the proportion of the arm at
which they fall, then these positions normalized to the genome-wide
p or q arm proportion. b,
Identification of chromosomal zones of recombination use (“crossover
zones”) from all donors’ crossovers for 22 autosomes. Density
plots of crossover location for all sperm donors’ total 813,122
crossovers (aneuploid chromosomes excluded; crossover location is the
midpoint between SNPs bounding crossovers) along autosomes (hg38) are shown.
Crossover zones (bounded by local minima of crossover density) are shown by
alternating shades of gray. Diagonally-hatched rectangles indicate
centromeres (or centromeres and acrocentric arms).
Crossover placement in end zones, and crossover separation, vary in ways
that correlate with crossover rate – among sperm donors and among
individual gametes.
Analyses are shown by donor (a-h,
n = 20 sperm donors) or by individual gamete
(i-j, n = 31,228 gametes). In
a-h, the left panels show the phenotype distributions for
individual donors, and the right panels show the relationship to the
donors’ crossover rates. To control for the effect of the number of
crossovers, the analyses in panels c, d, and g-j
use “two-crossover chromosomes” – chromosomes on which
exactly two crossovers occurred. For scatter plots (a-h,
right), all x axes show mean crossover rate and all error bars are 95%
confidence intervals (y axes are described per panel). a and
b, The proportion of crossovers falling in the most distal
chromosome crossover zones (a) and crossover separation
(b) – a readout of crossover interference, the
distance between consecutive crossovers (Mb) – vary among 20 sperm
donors (left panels; proportion of crossovers in end per cell distributions
among-donor Kruskal–Wallis chi-squared = 2,334, df =
19, p < 10−300; all distances
between consecutive crossovers among-donor Kruskal–Wallis chi-squared
= 3,309, df = 19, p <
10−300). Right panels show both properties (y axes,
total proportion of crossovers in distal zones and median crossover
separation, respectively) vs. donor’s crossover rate (Correlation
results for 20 sperm donors: proportion of all crossovers across cells in
distal zones Pearson’s r = −0.95, two-sided
p = 2 × 10−10;
Pearson’s r = −0.96, two-sided
p = 1 × 10−11).
c, An alternative method for the proportion of crossovers
in the distal regions of chromosomes: proportion of crossovers in the distal
50% of chromosome arms varies across donors (left, among-donor
Kruskal–Wallis chi-squared = 2,209, df = 19,
p < 10−300) and negatively
correlates with recombination rate (right, Pearson’s
r = −0.92, two-sided p = 2
× 10−8; y axis shows actual proportion of
crossovers in distal 50%). d, As in (c), but with
proportion of crossovers from two-crossover chromosomes occurring in the
distal 50% of chromosome arms. Left, among-donor Kruskal–Wallis
chi-squared = 1,058, df = 19, p = 2
× 10−212; right, correlation with recombination
rate Pearson’s r = −0.93, two-sided
p = 4 × 10-9. e, as in
(b) but for consecutive crossovers on the
q arm of the chromosome. Left, among-donor
Kruskal–Wallis chi-squared = 346, df = 19,
p = 7 × 10−62; right,
correlation with recombination rate Pearson’s r =
−0.90, two-sided p = 5 × 10-8.
f, as in (b) but for consecutive crossovers on
opposite chromosome arms (i.e. that span the centromere).
Left, among-donor Kruskal–Wallis chi-squared = 1,554,
df = 19, p = 1 <
10−300; right, correlation with recombination rate
Pearson’s r = −0.96, two-sided
p = 3 × 10-11. g, as in
(e) but for distances between consecutive crossovers on
two-crossover chromosomes. Left, among-donor Kruskal–Wallis
chi-squared = 181, df = 19, p = 2 ×
10−28; right, correlation with recombination rate
Pearson’s r = −0.88, two-sided
p = 3 × 10-7. h, as in
(f) but for distances between consecutive crossovers on
two-crossover chromosomes. Left, among-donor Kruskal–Wallis
chi-squared = 930, df = 19, p = 5 ×
10−185; right, correlation with recombination rate
Pearson’s r = −0.92, two-sided
p = 1 × 10-8. i,
j, Boxplots show medians and interquartile ranges with
whiskers extending to 1.5 times the interquartile range from the box. Each
point is a cell. i, Within-donor percentile of proportion of
crossovers from two-crossover chromosomes falling in distal zones plotted
vs. crossover rate decile. Groups are deciles of crossover rate normalized
by converting each cell’s crossover count to a percentile
within-donor (All cells from all donors shown together, n
cells in deciles = 3,152, 3,122, 3,276, 3,067, 3,080, 3,073, 3,135, 3,132,
3,090, 3,101, respectively [31,228 total]). Because the initial data is
proportions with small denominators, an integer effect is evident as pileups
at certain values. j, Crossover interference from two-crossover
chromosomes (median consecutive crossover separation per cell shown). Each
point represents the median of all percentile-expressed distances between
crossovers from all two-crossover chromosomes in one cell (percentile taken
within-chromosome), groupings and ns as in
(i).
Crossover interference in individual sperm donors and on
chromosomes.
a, Solid lines show density plots (scaled by
donor’s crossover rate) of the observed distance (separation) between
consecutive crossovers as measured in the proportion of the chromosome
separating them (left) and in genomic (Mb) distance (right), one line per
donor (n = 20). Dashed lines show the distance between
consecutive crossovers when crossover locations are permuted randomly across
cells to remove the effect of crossover interference. b, The
median of observed distances between consecutive crossovers for one donor
(NC18, 10th lowest recombination rate of 20 donors; blue dashed
line) is shown with a histogram of the medians of n =
10,000 among-cell crossover permutations (both permutation one-sided
ps < 0.0001). Units, proportion of the
chromosome (left) and genomic (Mb) distance (right). c,
Crossover separation on example chromosomes; plots and ns
are as in (b). (Permutation one-sided p
< 0.0001 for all chromosomes in all sperm donors except occasionally
chromosome 21, where especially few double crossovers occur).
d, Median distances between donor NC18’s consecutive
crossovers for each autosome for all inter-crossover distances (top) and
inter-crossover distances only from chromosomes with two crossovers
(bottom). Units are proportion of the chromosome (left) and genomic (Mb)
distance (right). e, Schematic: analyzing crossover
interference in individualized genetic distance (one 20 cM window shown)
using a donor’s own recombination map. f, When
parameterized using each donor’s own genetic map, sperm
donors’ crossover interference profiles across multiple genetic
distance windows (as shown in e) do not differ (n
= 20 sperm donors, Kruskal–Wallis chi-squared = 0.22,
df = 19, p = 1 using 20 estimates [cM
distances] for each of 20 donors). Error bars, binomial 95% confidence
intervals on proportion of cells with a second crossover in the window
given. This suggests that inter-individual variation in crossover
interference, while substantial when measured in base pairs, is negligible
when measured in donor-specific genetic distance, pointing to a shared
influence upon crossover interference and crossover rate.
Relationships of aneuploidy frequency to chromosome size and
recombination.
a. The across-donor per-cell frequency of chromosome
losses (left) and gains (center), plotted against the length of the
chromosome (hg38; for losses across n = 22 chromosomes,
Pearson’s r = −0.29, two-sided
p = 0.19 and for gains across n = 22
chromosomes, Pearson’s r = −0.23, two-sided
p = 0.30). Right, the per-chromosome rate of losses
exceeding gains (number of losses minus number of gains divided by number of
cells) is plotted against the length of the chromosomes (across
n = 22 chromosomes, Pearson’s r
= −0.29, two-sided p = 0.19). Red labels,
acrocentric chromosomes. Error bars, 95% binomial confidence intervals on
per-cell frequency (number of events / number of cells, all 31,228 cells
included). b-d, Relationship between aneuploidy frequency and
recombination. Only autosomal whole-chromosome aneuploidies are included.
b, Left, Total number of crossovers on MI nondisjoined
chromosomes (blue line; chromosomes analyzed, called as transitions between
the presence of one haplotype and both haplotypes on the gained chromosome)
compared to n = 10,000 donor- and chromosome-matched sets
(35 × 2 chromosomes per set) of properly segregated chromosomes (gray
histogram; permutation). (54 total crossovers on MI gains vs. 84.2 mean
total crossovers on sets of matched chromosomes, one-sided permutation
p < 0.0001, for the hypothesis that gained
chromosomes have fewer crossovers). Right, as left but for gains occurring
during MII (71 MII-derived gained chromosomes of one whole copy from all
individuals with fewer than 5 crossovers called on gained chromosome).
(One-sided permutation p = 0.98 for MII from
n = 10,000 permutations, for the hypothesis that gained
chromosomes have fewer crossovers; sister chromatids nondisjoined in MII
capture all crossovers whereas matched chromosomes do not: matched
simulations and homologs nondisjoined in MI capture only a random half of
crossovers occurring on that chromosome in the parent spermatocyte).
c, Crossovers per non-aneuploid megabase from each cell
from each donor, split by aneuploidy status (n cells = 498,
50, 92, 30,609, left-to-right; “euploid” excludes cells with
any autosomal whole- or partial-chromosomal loss or gain and
“gains” includes gains of one or more than one chromosome
copy; Mann–Whitney test W = 7,264,117, 722,191,
1,370,376; two-sided p = 0.07, 0.49, 0.66 for all autosomal
aneuploidies, meiosis I (MI) gains, and meiosis II (MII) gains,
respectively, all compared against euploid). Each cell is one point;
boxplots show medians and interquartile ranges with whiskers extending to
1.5 times the interquartile range from the box. d, Per-cell
crossover rates vs. per-cell aneuploidy (loss and gain) rates,
n = 20 donors (colored by crossover rate).
p values shown in subtitles are for two-sided
Pearson’s correlation tests. Error bars are 95% confidence intervals
on mean crossover rate (x axis) and on observed aneuploidy frequency (y
axis).
Additional examples of non-canonical aneuploidy events detected with
Sperm-seq, including those shown in Fig.
3f.
Copy number, SNPs, haplotypes, and centromeres are plotted as in
Fig. 3a. Donor and cell identity
are noted in the panel subtitles. Coordinates are in hg38. Chromosomes 2,
20, 21 (a) and 15 (b) are sometimes present in 3
copies in an otherwise haploid sperm cell. c, A distinct,
recurring triplication of much of chromosome 15, from ~33 Mb onwards but not
including the proximal part of the q arm, also recurs in
cells from 3 donors. d, Chromosome arm-level losses (top) and
gains (including in more than one copy, bottom three panels, and a compound
gain of the p arm and loss of the q arm,
top panel).
Single-cell and person-to-person variation in diverse meiotic phenotypes
may be governed by variation in the physical compaction of chromosomes
during meiosis.
Previous work shows that the physical length of the same chromosome
varies among spermatocytes at the pachytene stage of meiosis, likely by
differential looping of DNA along the meiotic chromosome axis
(e.g. left column shows smaller loops, resulting in
more loops total and in greater total axis length compared to the right
column with larger loops)[15,72-75]. This physical chromosome
length is correlated across chromosomes among cells from the same
individual[21,76] and correlates with
crossover number[15,20,21,42,73,76]. This length – measured as the length of the
chromosome axis or of the synaptonemal complex (the connector of homologous
chromosomes) – can vary two or more-fold among a human’s
spermatocytes[21].
We propose that the same process differs on average across individuals and
may substantially explain inter-individual variation in recombination rate.
On average, individual 1 (left) would have meiotic chromosomes that are
physically longer (less compacted) in an average cell than individual 2
(right); one example chromosome is shown in the figure. After the first
crossover on a chromosome (likely in a distal region of a chromosome, where
synapsis typically begins in male human meiosis before spreading across the
whole chromosome[13-15]), crossover interference
prevents nearby double-strand breaks (DSBs) from becoming crossovers; DSBs
far away can become crossovers (which themselves also cause interference).
More DSBs are likely created on physically longer chromosomes, and crossover
interference occurs among non-crossover as well as crossover DSBs[77]. Crossover interference
occurs over relatively fixed physical (micron) distances[43-45,76]; these
distances encompass different genomic (Mb) lengths of DNA in different cells
or on average in different people due to variable compaction. Thus,
crossover interference tends to lead to different total number of crossovers
as a function of degree of compaction, resulting in the observed negative
correlation (Fig. 2c,e) of crossover rate with crossover spacing (as
measured in base pairs). Given that the first crossover likely occurs in a
distal region of the chromosome, this model can also explain the negative
correlation (Fig. 2b,d) of crossover rate with the proportion of
crossovers in chromosome ends. Note: this figure shows the total number of
crossovers, crossover interference extent, and crossover locations for both
sister chromatids of each homolog combined; in reality, these crossovers are
distributed among the sister chromatids, making these relationships harder
to detect in daughter sperm cells and requiring large numbers of
observations to make relationships among these phenotypes clear.Sperm donor and single-sperm sequencing characteristics and results.As provided by sperm bank. Afr. Am., of African American
ancestry; Eur., of European ancestry; As., of Asian ancestry; (?),
conflicting ancestry information given.These numbers are the total number of aneuploidy events divided
by the total number of cells multiplied by 100; cells can have more than
one event.Sum across all cells from all sperm donors.Median or mean across all individual cells from all sperm donors
(31,228 measurements summarized).Median or mean of aggregate metrics across samples (20
measurements summarized).Median across all crossovers (813,122 measurements
summarized).
Authors: Augustine Kong; Daniel F Gudbjartsson; Jesus Sainz; Gudrun M Jonsdottir; Sigurjon A Gudjonsson; Bjorgvin Richardsson; Sigrun Sigurdardottir; John Barnard; Bjorn Hallbeck; Gisli Masson; Adam Shlien; Stefan T Palsson; Michael L Frigge; Thorgeir E Thorgeirsson; Jeffrey R Gulcher; Kari Stefansson Journal: Nat Genet Date: 2002-06-10 Impact factor: 38.330
Authors: Bjarni V Halldorsson; Gunnar Palsson; Olafur A Stefansson; Hakon Jonsson; Marteinn T Hardarson; Hannes P Eggertsson; Bjarni Gunnarsson; Asmundur Oddsson; Gisli H Halldorsson; Florian Zink; Sigurjon A Gudjonsson; Michael L Frigge; Gudmar Thorleifsson; Asgeir Sigurdsson; Simon N Stacey; Patrick Sulem; Gisli Masson; Agnar Helgason; Daniel F Gudbjartsson; Unnur Thorsteinsdottir; Kari Stefansson Journal: Science Date: 2019-01-25 Impact factor: 47.728
Authors: Xiaoxu Yang; Martin W Breuss; Xin Xu; Danny Antaki; Kiely N James; Valentina Stanley; Laurel L Ball; Renee D George; Sara A Wirth; Beibei Cao; An Nguyen; Jennifer McEvoy-Venneri; Guoliang Chai; Shareef Nahas; Lucitia Van Der Kraan; Yan Ding; Jonathan Sebat; Joseph G Gleeson Journal: Cell Date: 2021-08-12 Impact factor: 66.850
Authors: Lena Wartosch; Karen Schindler; Melina Schuh; Jennifer R Gruhn; Eva R Hoffmann; Rajiv C McCoy; Jinchuan Xing Journal: Prenat Diagn Date: 2021-03-22 Impact factor: 3.050