Abstract
Meiosis, while critical for reproduction, is also highly variable and error prone: crossover rates vary among humans and individual gametes, and chromosome nondisjunction leads to aneuploidy, a leading cause of miscarriage. To study variation in meiotic outcomes within and across individuals, we developed a way to sequence many individual sperm genomes at once. We used this method to sequence the genomes of 31,228 gametes from 20 sperm donors, identifying 813,122 crossovers, 787 aneuploid chromosomes, and unexpected genomic anomalies. Different sperm donors varied four-fold in the frequency of aneuploid sperm, and aneuploid chromosomes gained in meiosis I had 36% fewer crossovers than corresponding non-aneuploid chromosomes. Diverse recombination phenotypes were surprisingly coordinated: donors with high average crossover rates also made a larger fraction of their crossovers in centromere-proximal regions and placed their crossovers closer together. These same relationships were also evident in the variation among individual gametes from the same donor: sperm with more crossovers tended to have made crossovers closer together and in centromere-proximal regions. Variation in the physical compaction of chromosomes could help explain this coordination of meiotic variation across chromosomes, gametes, and individuals.
Introduction
One way to learn about human meiosis has been to study how genomes are inherited across generations. DNA variation data are now available for millions of people and thousands of families; the locations of crossovers can be estimated from genomic segment sharing among relatives and linkage-disequilibrium patterns in populations1–5. Although these studies sample only the small number of reproductively successful gametes from individual humans, such analyses have revealed that average crossover number and crossover location vary among individual humans and associate with common variants (single nucleotide polymorphisms, SNPs) at many genomic loci4, 6–10.
Another powerful approach to studying meiosis is to directly visualize meiotic processes in individual cells. For example, technical innovations have made it possible to ascertain that homologous chromosomes in spermatocytes generally begin synapsis (their physical connection) near their telomeres11–13; to observe double-strand breaks (a subset of which progress to crossovers) by monitoring proteins that bind to such breaks14–17; and to detect adverse meiotic outcomes, such as chromosome mis-segregation18–23. Studies based on such methods have revealed substantial cell-to-cell variation, even among cells from the same individual, in features such as the physical compaction of meiotic chromosomes24–26.
More recently, human meiotic phenotypes have begun to be studied via genotyping or sequencing up to 100 gametes from one person, demonstrating that crossovers and aneuploidy can be ascertained from direct analysis of gamete genomes27–31. Despite these advances, it has not yet been possible to measure meiotic phenotypes genome-wide in many individual gametes from many people.
Results
A high-throughput single-sperm sequencing method
To this end, we developed a method called “Sperm-seq” with which the genomes of many individual sperm can be sequenced to low coverage quickly and simultaneously. To access the tightly compacted sperm genome, we decondense sperm nuclei using reagents that mimic the molecules with which the egg unpacks the sperm pronucleus (Fig. 1a, Methods). These decondensed sperm DNA “florets” are then encapsulated with barcoded beads in microfluidic droplets in which the sperm genomes are individually barcoded and amplified32. Each genomic sequence read has a barcode that reports its droplet—and thus gamete—of origin (Fig. 1a). We used this technique to sequence 31,228 sperm cells from 20 sperm donors (974-2,274 gametes per donor), sequencing a median of ∼1% of the haploid genome of each cell (Table 1); deeper sequencing allows coverage of ∼10% of a gamete’s genome.
Data from so many individual gametes made it possible to infer individuals’ allelic haplotypes along the full length of every chromosome. We first identified the heterozygous sites in each donor’s genome using the Sperm-seq sequence reads (∼40x coverage per donor, Methods). Because each sperm chromosome is a mosaic of long segments derived from one or the other parental haplotype, the chromosomal phase of heterozygous sites could be inferred from the co-appearance patterns of alleles (of different SNPs) across many sperm cells (Fig. 1b, Methods). In silico simulations and comparisons to haplotypes from population-based analyses indicated that Sperm-seq assigned alleles to haplotypes with 97.5–99.9% accuracy (Extended Data Fig. 1a, Supplemental Text). These phased haplotypes made it straightforward to identify and remove from the analysis cell “doublets,” cases in which two sperm genomes were tagged with the same cell barcode, from the presence of both parental haplotypes at multiple loci across chromosomes (Extended Data Fig. 1b-d, Methods). We also identified surprising “bead doublets,” in which two beads’ barcodes appeared to have tagged the same gamete genome, as they reported identical genome-wide haplotypes (ascertained through different SNPs) (Extended Data Fig. 2a,b, Methods). Bead doublets were useful for evaluating the replicability of Sperm-seq data and analyses, which is usually impossible to do in inherently destructive single-cell molecular studies (Extended Data Fig. 2c-e).
Recombination rate in sperm donors and sperm cells
Analysis of Sperm-seq data identifies crossover (recombination) events as transitions between parental haplotypes (Fig. 2a, Methods). We identified 813,122 crossovers in the 31,228 gamete genomes (mean 26.03 per gamete; 25,839-62,110 per sperm donor, Table 1). Crossover locations were inferred with a median resolution of 240 kb, and 9,746 (1.2%) were inferred at resolution finer than 10 kb (Table 1, Supplemental Text). In analysis of data from bead doublets, 95.6% of crossovers were detected in both cell barcodes; another 2.1% were near the ends of SNP coverage on chromosomes, where the power to detect crossovers is incomplete (Extended Data Fig. 2e). Estimates of crossover rate and location were robust to down-sampling to the same number of SNP observations in each cell (Extended Data Fig. 3, Methods).
Crossovers, which create new allelic combinations, differ in genomic locations and average number among individual humans2, 3, 6, 7, 9, 10. The 20 individual sperm donors exhibited recombination rates ranging from 22.2 (95% confidence interval [CI] 22.0–22.4) to 28.1 (95% CI 27.9–28.4) crossovers per cell, consistent with earlier rate estimates from a few living children2, 6–10 or up to 100 sperm cells29, 31 (Table 1, Fig. 2b,c, Extended Data Figs. 4, 5). For each chromosome, the proportion of cells with each observed number of crossovers varied among individuals in the way predicted by their global crossover rate (Fig. 2c, Extended Data Fig. 4). The 813,122 inferred crossovers allowed us to generate genetic maps for each of the donors; these maps were broadly concordant with deCODE’s paternal genetic map previously estimated by genotyping thousands of families10 (Fig. 2d,e; Extended Data Fig. 6; Supplemental Text).
More variation was present at the single-cell level: the range in the routine number of crossovers per cell was 17 to 37 (1st and 99th percentiles, median across donors), with an across-cell standard deviation of 4.23 (median across donors). Crossover number could in principle be co-regulated nucleus-wide, as suggested by the correlation of crossover number across chromosomes observed in pedigrees9 and spermatocytes undergoing meiosis25, 26. In fact, individual gametes with fewer crossovers in half of their genome (the odd-numbered chromosomes) did tend to have fewer crossovers in the other half of their genome (Pearson’s r = 0.09, p = 8 × 10−54 with all gametes from all donors combined after within-donor normalization; Supplemental Text). (This point estimate greatly underestimates the true correlation of crossover number across chromosomes in spermatocytes, as any co-regulation of crossover number across chromosomes would occur in the spermatocyte, whose daughter gametes each have only a 50% chance of inheriting any given parental crossover.)
On any given chromosome, fewer cells had no crossovers or many crossovers than would be predicted by a model in which crossovers are independent, random events (Extended Data Fig. 7; Supplemental Text). This is consistent with biological constraint on crossover number, a major determinant of which is crossover interference (reviewed in 33, 34).
Crossover location and interference
Crossovers are distributed non-uniformly along chromosomes, in patterns that vary at both fine scales (such as their recurrence in hotspots) and large scales (such as their concentration in sub-telomeric regions in male meiosis)1, 4–7, 10, 35, 36. Although the spatial resolution of most crossover inferences was not well suited for analyzing fine-scale selection of crossover sites (e.g., hotspots), the large number of crossovers ascertained per sperm donor (25,839-62,110) made it possible to analyze variation in large-scale crossover placement.
Crossovers were concentrated in large regions of the genome (“crossover zones”) that were shared across donors (Fig. 3, Extended Data Figs. 8, 9). Zones in the sub-telomeric regions had the most crossovers, whereas regions close to the centromere had fewer crossovers, consistent with earlier findings1, 4, 6, 10, 37 (Fig. 3a). However, on the larger acrocentric chromosomes (chromosomes 13, 14, and 15), which do not perform crossovers in their p arms, each centromere-proximal zone had a crossover rate comparable to the most telomeric zone on the same chromosome.
The crossover zones with the most variable usage (across people) were all adjacent to centromeres (Fig. 3b, Extended Data Fig. 9); individuals with high recombination rates used these zones much more frequently (crossover location patterns were robust to among-donor coverage differences, Extended Data Fig. 3c,e). Of the 10 crossover zones with crossover rates correlating most strongly with global recombination rate, all but one were centromere-proximal, and the exception was separated from the centromere by only one small zone. Consequently, the proportion of crossovers in the most distal zones of the chromosomes varied strikingly among individuals (Kruskal–Wallis chi-squared = 2,334, df = 19, p < 10−300) and was negatively correlated with recombination rate (Pearson’s r = −0.95, p = 2 × 10−10) (Extended Data Fig. 10a).
Crossover interference, which manifests in the tendency of crossovers to be further apart than expected by chance, occurs in humans25, 31, 37–41. The effect of crossover interference was visible in each of the 20 sperm donors: the distances between consecutive crossovers were greater in the observed data than when crossover locations were permuted across cells (Extended Data Figs. 11-15). The extent of crossover interference varied greatly among individual sperm donors (Kruskal–Wallis chi-squared = 4,316, df = 19, p < 10−300) and correlated inversely with a donor’s global recombination rate (Pearson’s r = −0.99, p = 9 × 10−16) (Extended Data Fig. 10b).
We estimated crossover placement and interference from the 180,738 chromosomes with exactly two crossovers to determine whether the relationships between these meiotic phenotypes and crossover rate were simply trivial consequences of the number of crossovers observed on a chromosome (Fig. 4a). In addition to capturing the effects of a cell or donor’s underlying meiotic proclivity rather than detected crossover number, this analysis includes the effect of any crossovers that occurred in the parent spermatocyte on the detected two-crossover chromosome’s non-observed sister chromatid. On two-crossover chromosomes, end-zone usage (Fig. 4b) and crossover separation (Fig. 4c) varied across individuals (Kruskal–Wallis chi-squared = 1,034, df = 19, p = 10−207 and Kruskal–Wallis chi-squared = 1,820, df = 19, p < 10−300, respectively) and correlated strongly and negatively with the donor’s genome-wide recombination rate (Pearson’s r = −0.95, p = 8 × 10−11 and Pearson’s r = −0.90, p = 5 × 10−8, respectively; additional control analyses described in Supplemental Text). These relationships indicate that inter-individual variation in recombination rates is a proxy for other meiotic phenotypes, including crossover interference and position preference.
Single-cell analysis makes it possible to see how cellular phenotypes relate to one another, both across donors and across individual cells from the same donor. An intriguing possibility is that the same relationships generate both variation at both single-cell and person-to-person levels. To investigate this idea, we looked for connections between crossover rate and other crossover phenotypes among individual sperm cells, asking whether cells with more or fewer crossovers than the average for their donor exhibited distinct crossover interference and crossover-position-preference phenotypes. On two-crossover chromosomes, cells with more crossovers (on other chromosomes) placed a smaller fraction of their crossovers in chromosomal end zones and made their crossovers closer together (Fig. 4d,e, Extended Data Figure 16; Mann–Whitney W = 5,271,934.5; p = 2 × 10−9 in proportion of crossovers in distal zones in the 10% of cells with the highest crossover rate vs. 10% of cells with lowest crossover rate, Mann– Whitney W = 148,548,161, p = 3 ×10−53 result in crossover separation between cells in these same deciles of crossover rate; Methods). This result suggests that analogous relationships generate variations in meiotic outcome both among cells and across individuals (Discussion).
Aneuploidy across chromosomes and individual sperm donors
During meiosis, a chromosome can mis-segregate (non-disjoin), yielding two aneuploid gametes in which that chromosome is reciprocally absent (a loss) or present in two copies (a gain). The frequency of paternally-derived aneuploidy is typically measured by fluorescence in situ hybridization (FISH) in a few chromosomes in single sperm21–23 or inferred genome-wide from embryos42, 43. We measured the ploidy of each chromosome and chromosome arm in each of the 31,228 gametes by analyzing sequence coverage (Fig. 5a, Methods), finding 787 whole-chromosome aneuploidies and 133 chromosome arm-scale gains and losses. All chromosomes and sperm donors were affected, with the sex chromosomes and acrocentric chromosomes (13, 14, 15, 21, and 22) having the highest rates of aneuploidy, consistent with the results of FISH studies that include chromosomes X, Y, 21, and 2221–23, 44 (Fig. 5b).
The frequency of aneuploidy varied 4.5-fold among individual sperm donors, who had rates of 0.010 to 0.046 aneuploidy events per cell (Fig. 5c, Table 1). As expected, donors with more losses also had more gains (autosomes only Pearson’s r = 0.51, p = 0.02; including XY Pearson’s r = 0.62, p = 0.003). This variation in aneuploidy rate among 20 young sperm donors (18–38 years), who were judged by clinical criteria to have healthy sperm, appears to reflect genuine inter-individual variation in vulnerability to nondisjunction (rather than statistical noise, Supplemental Text), consistent with FISH-derived observations of aneuploidy frequency in six chromosomes among 10 donors22, 23.
Canonically, nondisjunction creates a loss and a gain, such that one might expect sperm with chromosome losses and gains to be equally common. However, we observed 2.4-fold more losses than gains (554 losses vs. 233 gains, proportion test p = 2 × 10−30), and this asymmetry did not appear to reflect technical ascertainment bias (Supplemental Text; Extended Data Fig. 17). Among early embryos, losses of chromosomes are observed more frequently than gains, especially among paternal events42, 43; this imbalance has previously been attributed to post-fertilization mitotic chromosome loss, as it has not been observed in FISH studies18, 21, 23. However, our results suggest that gain/loss asymmetry may already be present among sperm.
Nondisjunction can occur at meiosis I (MI), when homologous chromosomes separate, or at meiosis II (MII), when sister chromatids separate. Because recombination occurs in MI (prior to disjunction) but does not occur at centromeres, homologs nondisjoined in MI will have different haplotypes at their centromeres, whereas sisters nondisjoined in MII will have the same haplotype at their centromeres (Fig. 5a, Methods). (On the sex chromosomes, X and Y disjoin in MI, and the sister chromatids of X and Y disjoin at MII.) Encouragingly, for chromosome 21, the principal chromosome for which earlier estimates (from patients with trisomy) were possible, our finding of 33% MI events and 67% MII events matched previous paternal estimates45.
Across all chromosomes, 112 gains arose during MI (50 autosomal, 62 sex chromosome) and 120 during MII (92 autosomal, 28 sex chromosome). Sex chromosomes were 2.2 times more likely to be affected in MI than MII, whereas autosomes were 2.0 times more likely to be affected in MII than MI (proportion test 35.2% MI gains on autosomes vs. 68.9% MI gains on sex chromosomes p = 1.3 × 10−6). Division-of-origin frequencies did not correlate either across chromosomes or sperm donors, implying that MI and MII have distinct nondisjunction vulnerabilities across people and individual chromosomes (Fig. 5d,e; across autosomes, Pearson’s r = 0.32, p = 0.15; across donors autosomes only, Pearson’s r = 0.06, p = 0.80; including XY, Pearson’s r = 0.17, p = 0.47) (consistent with studies of viable trisomies 13, 18, and 21 in embryos and individuals45–50).
Relationship between aneuploidy and recombination
Although crossovers seem protective against nondisjunction in maternal meiosis25, 48–51, this relationship to aneuploidy is less clear in paternal meiosis29, 45, 52–54. To test whether nondisjunction associated with fewer crossovers in sperm, we compared the number of crossovers on gained chromosomes to those on chromosomes of normal copy number (we focused on gains because in the case of losses, it is impossible to determine what occurred on an absent chromosome). Crossovers on gained chromosomes were inferred as transitions between the presence of both haplotypes and the presence of just one haplotype. We compared the total number of crossovers on gained chromosomes (ascertainment criteria are described in Supplemental Text) to the total number of crossovers in 10,000 sets of correctly segregated chromosomes matched (to each gained chromosome) for donor and chromosome identity. Chromosome gains occurring in MI (when recombination happens) had 36% fewer total crossovers than the mean of the matched sets of well-segregated chromosomes (54 total crossovers on gains, 84.2 mean total crossovers on matched sets, one-sided permutation p < 0.0001), suggesting that crossovers protected against MI nondisjunction of the chromosomes on which they occurred (Fig. 5f). The same was not true of MII gains (Supplemental Text; Extended Data Fig. 18a).
We tested for broader relationships between crossover rates and aneuploidy at the cell and donor levels and found no clear effects, although we had limited power (Supplemental Text) (Extended Data Fig. 18b,c). One potential explanation for these findings is that the actual crossover, rather than the propensity toward crossing over in a cell or individual, is protective against aneuploidy, consistent with a model in which crossing over helps provide necessary chromosomal cohesion and/or tension for proper disjunction55.
Surprising chromosome-scale genomic anomalies
Aneuploidy is thought to arise from a single nondisjunction event that leads to loss (in one gamete) or gain (in the reciprocal gamete) of one chromosome copy. Surprisingly, we detected 19 cells that had two extra copies of entire (or nearly entire) chromosomes (2, 15, 20, and 21), perhaps due to sequential nondisjunction events in MI and MII (Fig. 6a,b, Extended Data Fig. 19a,b). More cells had three copies of chromosome 15 (n = 10) than two copies of chromosome 15 (n = 2) (Fisher’s exact test vs. Poisson p = 2 × 10−7, Supplemental Text), raising the possibility that, for chromosome 15, MI nondisjunction leads to additional nondisjunction during MII.
Several sperm had complex aneuploidy events that were not explained by nondisjunction. These included: multiple cells with three copies of most, but not all, of the q arm of chromosome 15; one cell that gained the p arm of chromosome 4 while losing the q arm; and one cell with at least eight copies of most of the q arm of chromosome 4 (Fig. 6c-e; Extended Data Fig. 19c,d). We estimate that the gamete with at least eight copies of 127 Mb of 4q contained a minimum of 890 Mb of extra genomic DNA, demonstrating that the human sperm nucleus can accommodate at least 30% more DNA than is typically in the haploid genome (Fig. 6e). This gamete carried both parental haplotypes of chromosome 4, though the extra copies came from just one of the two parental haplotypes (93% of observed alleles of heterozygous SNPs in the amplified region were haplotype 2). We know of no mechanism that would generate such a gamete.
Discussion
The genomes of 31,228 human sperm cells revealed interconnected variation among diverse meiotic phenotypes. These relationships existed at different and sometimes multiple levels: (i) individuals’ average meiotic phenotypes; (ii) variation among single sperm cells from the same person; and (iii) specific chromosomes and events.
Rates of aneuploidy varied conspicuously (from 1.0% to 4.6%) among the 20 young sperm donors (Fig. 5c). Aneuploidy was less likely when a chromosome had more crossovers, though at higher levels of organization (cells and donors) aneuploidy rates and crossover rates varied independently (Extended Data Fig. 18). Some chromosomes were more vulnerable to nondisjunction in MI, and others to nondisjunction in MII; some donors were more vulnerable to nondisjunction in MI, and others to nondisjunction in MII (Fig. 5d,e). These results suggest a complex landscape of vulnerability to aneuploidy in which inter-individual variation is multi-faceted and considerable in magnitude.
Inter-individual variation in crossover rates has previously been visible through computational analyses of SNP data1–10. Here, single-gamete sequencing revealed that donors with high crossover rates also exhibit other meiotic phenotypes, including a tendency to make crossovers closer together and to place a smaller fraction of their crossovers in telomere-proximal zones (Figs. 3, 4). The same underlying biological variation may shape all three phenotypes (rate, location, and separation).
Individual cells from the same donor also appeared to have underlying meiotic proclivities that coordinated these meiotic outcomes across the genome and with one another. This was observed in the correlation of crossover number across different chromosomes: even among cells from the same donor, gametes with more crossovers in half of their genome tended to have more crossovers in the other half. High-crossover-rate cells also made pairs of crossovers (on the same chromosome) closer together (in genomic distance) and placed proportionally fewer of their crossovers in telomere-proximal chromosomal regions (Fig. 4d,e).
What could cause these meiotic phenotypes to be coupled to one another, across chromosomes and at multiple levels of organization (cells and individuals)? Intriguingly in this regard, the physical length of meiotic chromosomes – which is inversely related to their degree of compaction – has been observed to vary among meiotic cells, and individual cells with more compacted (shorter) chromosomes also tend to have fewer crossovers24–26, 56. A simple model (Fig. 7) might explain the observed correlations: cell-to-cell and person-to-person variation in the compaction of meiotic chromosomes could cause the variation in and correlations among crossover rate, location, and interference, provided that crossover interference occurs as a function of physical (micron) distance along the meiotic chromosome axis/synaptonemal complex rather than genomic (base pair) distance25, 34, 57, 58 and that the first crossover on a chromosome is more likely to occur near a telomere11–13 (Fig. 7).
Human genetics research has revealed that recombination phenotypes are heritable and associate with common SNPs at many genomic loci4, 6–10. The largest genome-wide association study of crossover phenotypes recently found that variation in crossover rate and placement is associated with SNP haplotypes near genes that encode components of the synaptonemal complex, which connects and compacts meiotic chromosomes8. It is reasonable to hypothesize that inherited genetic variation at these loci might bias the average degree of compaction along the chromosome axis or synaptonemal complex, particularly given that this same property varies among cells from the same donor24–26. Such a model would offer a natural integration of observations about inter-individual and gamete-to-gamete variation, and of relationships among diverse meiotic phenotypes (Fig. 7).
Our results suggest that, in meiosis, a shared set of patterns and constraints shapes inter- and intra-individual (single-cell) variation in meiotic outcomes. It is an intriguing possibility that such parallel relationships manifest in diverse aspects of cellular biology and genetics.
Author Contributions
A.D.B. and S.A.M. conceived and led the studies. A.D.B, S.A.M., and C.J.M. developed the experimental methods. A.D.B. and C.J.M. performed the experiments and generated the data. A.D.B and S.A.M. designed the analysis strategies, and A.D.B. performed the analyses. A.D.B., J.N., S.A.B., and A.W. wrote the software and analytical methods. A.D.B. and S.A.M. wrote the manuscript with contributions from all authors.
Competing Interests
A.D.B. and S.A.M. are inventors on a patent application, submitted by Harvard University and the Broad Institute, which covers experimental and analytical methods described in this manuscript.
Data Availability
Crossover and aneuploidy data (individual events and counts per donor and/or cell) are available via Zenodo, http://dx.doi.org/10.5281/zenodo.2581571. Raw sequence data will be deposited in the SRA via dbGaP (in process).
Code Availability
Analysis scripts and documentation are available via Zenodo, http://dx.doi.org/10.5281/zenodo.2581596.
Supplementary Information, which includes Supplemental Text and Methods, is available.
Extended Data Figures follow in this document.
Acknowledgments
We thank Giulio Genovese for suggestions on analyses, Evan Macosko for advice on technology development, and other members of the McCarroll lab, including Chris Whelan, Steven Burger, and Bob Handsaker for their advice. We thank Mark Daly, Joel Hirschhorn, Stephen Elledge, and Samantha Schilit for their insights, 10X Genomics for discussions about reagents, and Christina L. Usher and Christopher K. Patil for contributions to the manuscript text and figures. This work was supported by R01 HG006855 to S.A.M., by a Broad Institute NextGen award to S.A.M., and by a Harvard Medical School Program in Genetics and Genomics NIH Ruth L. Kirchstein training grant to A.D.B.