Aberrant landscapes of maternal meiotic crossovers contribute to aneuploidies in human embryos

Meiotic recombination is crucial for human genetic diversity and chromosome segregation accuracy. Understanding its variation across individuals and the processes by which it goes awry are long-standing goals in human genetics. Current approaches for inferring recombination landscapes either rely on population genetic patterns of linkage disequilibrium (LD)—capturing a time-averaged view—or direct detection of crossovers in gametes or multi-generation pedigrees, which limits dataset scale and availability. Here, we introduce an approach for inferring sex-specific recombination landscapes using data from preimplantation genetic testing for aneuploidy (PGT-A). This method relies on low-coverage (<0.05×) whole-genome sequencing of in vitro fertilized (IVF) embryo biopsies. To overcome the data sparsity, our method exploits its inherent relatedness structure, knowledge of haplotypes from external population reference panels, as well as the frequent occurrence of monosomies in embryos, whereby the remaining chromosome is phased by default. Extensive simulations demonstrate our method’s high accuracy, even at coverages as low as 0.02×. Applying this method to PGT-A data from 18,967 embryos, we mapped 70,660 recombination events with ~150 kbp resolution, replicating established sex-specific recombination patterns. We observed a reduced total length of the female genetic map in trisomies compared to disomies, as well as chromosome-specific alterations in crossover distributions. Based on haplotype configurations in pericentromeric regions, our data indicate chromosome-specific propensities for different mechanisms of meiotic error. Our results provide a comprehensive view of the role of aberrant meiotic recombination in the origins of human aneuploidies and offer a versatile tool for mapping crossovers in low-coverage sequencing data from multiple siblings.


Supplemental Figures
Chromosome 16 Common The output for siblings 1+2 The output for siblings 1+3 The output signal of our method F. E. D.
Figure S1.Schematic of our method to detect meiotic crossovers. A. IVF cycles that include a monosomic embryo are identified via conventional coverage-based copy number calling.B. Initially, we have no knowledge about the haplotype composition.C. For the affected chromosome, each disomic sibling is then contrasted with the monosomic sample to identify regions where the haplotypes match.D. Positive signal emitted by our algorithm indicates evidence of non-matching, whereas negative signal indicates evidence of matching.E. Output from various sibling embryos is compared in order to attribute crossovers to specific samples.F. Apparent crossovers that are shared among sibling embryos can be attributed to the reference monosomic sample, while other crossovers are attributed to the test samples.Figure S6.Chromosome-specific counts of aneuploidies detected in PGT-A data.We analyzed the CReATe PGT-A dataset with WisecondorX to infer the copy number of all autosomes across 20, 114 IVF embryos.We observe that segmental gains and losses are more common on longer chromosomes, while whole-chromosome aneuploidies are more common on short chromosomes, especially Chromosomes 15, 16, 21, and 22, consistent with previous literature.

Figure S7
. Distribution of maternal crossovers across normal disomic embryo samples.Crossovers were identified as transitions between tracts that matched versus did not match a sibling haploid/GW-isoUPD sample (only maternal chromosomes present).Each tract was required to cover at least 15 genomic windows and achieve a z-score of the at least 1.96.The length of each chromosome is normalized to 1 to aid visualization.The region size, where crossovers occur across sibling embryos at similar chromosomal positions and are solely associated with the reference monosomy, is set to 1.5% of the chromosome size.

Figure S8
. Distribution of paternal crossovers across normal disomic embryo samples.Crossovers were identified as transitions between tracts that matched versus did not match a sibling monosomic sample (only paternal chromosome present).Each tract was required to cover at least 25 genomic windows and achieve a z-score of the at least 1.96.The length of each chromosome is normalized to 1 to aid visualization.The region size, where crossovers occur across sibling embryos at similar chromosomal positions and are solely associated with the reference monosomy, is set to 3% of the chromosome size.

Figure S12
. Validating the assumed parental (i.e., sex-specific) origins of crossovers.We compared rates of putative maternal crossovers (identified via comparison to haploid/GW-isoUPD embryos) per bin inferred in our study to published male-specific crossovers from deCODE, stratifying across autosomal chromosomes.The Pearson correlation (r ) was reduced by > 0.25 for all autosomes, compared to the correlation of our results with deCODE female-specific recombination rate (Figure S9).This supports our hypothesis that vast majority of the effective-haploids carry the maternal genome, consistent with previous reports [33].     .Differences in distributions of crossovers between disomic and trisomic samples across Chromosomes 1 through 6.We calculated the distribution of maternal crossovers in embryos with two distict homologs (disomy) as well as three distinct homologs (trisomy) for Chromosomes 1-6 (35 bins per chromosome).Crossovers were identified as transitions between regions of "matched" and "unmatched" haplotypes and vice versa, where each region is classified with a z-score of the at least 1.96 and included at least 15 genomic windows for LD-CHASE or at least 30 genomic windows for LD-PGTA.

Figure S17
. Differences in distributions of crossovers between disomic and trisomic samples across Chromosomes 7 through 12.We calculated the distribution of maternal crossovers in embryos with two distict homologs (disomy) as well as three distinct homologs (trisomy) for Chromosomes 7-12 (35 bins per chromosome).Crossovers were identified as transitions between regions of "matched" and "unmatched" haplotypes and vice versa, where each region is classified with a z-score of the at least 1.96 and included at least 15 genomic windows for LD-CHASE or at least 30 genomic windows for LD-PGTA.

Figure S18
. Differences in distributions of crossovers between disomic and trisomic samples across Chromosomes 13 through 18.We calculated the distribution of maternal crossovers in embryos with two distict homologs (disomy) as well as three distinct homologs (trisomy) for Chromosomes 13-18 (we have 35 bins in Chromosomes 13-16 and 20 bins in Chromosomes 17-18).Crossovers were identified as transitions between regions of "matched" and "unmatched" haplotypes and vice versa, where each region is classified with a z-score of the at least 1.96 and included at least 15 genomic windows for LD-CHASE or at least 30 genomic windows for LD-PGTA.Table S3.Comparisons of genomic distributions of crossovers between disomic and trisomic samples.A Kolmogorov-Smirnov (KS) test was used for quantifying the difference among the three crossover distributions: (1) Disomies that were analyzed via LD-CHASE, (2) Disomies that were analyzed via LD-PGTA, and (3) Trisomies that were analyzed via LD-PGTA.One advantage of the KS test is that it does not require binning, as it quantifies the distance between two empirical cumulative distributions.Moreover, the exact p-value was computed using the permutation method.This involved evaluating all possible combinations of assignments of the combined data into two groups of the sizes of the two original samples and computing the KS statistic for each combination.The p-value is then the proportion of permutations that result in a KS statistic as extreme as (or more extreme than) the observed statistic.We note that the p-values for the comparison of Disomies with LD-CHASE vs. Trisomes with LD-PGTA were equal to or below 0.05 for chromosomes 7, 14, 16, 20, 21, and  Table S4.Number of IVF cycles used for mapping paternal crossovers.Relevant cycles are defined as those with at least one embryo affected by monosomy (of one or few chromosomes), as well as at least one embryo that is disomic for the same chromosome.In addition, we report the average number of disomic sibling embryos within these cycles.Table S5.Number of IVF cycles used for mapping maternal crossovers.Relevant cycles are defined as those with at least one embryo affected by haploidy/GW-isoUPD, as well as at least one embryo that is disomic for one or more chromosomes.In addition, we report the average number of disomic sibling embryos within these cycles.The combination of reads that is associated with homologs f and g can be expressed as {A, B, D, F} and {C, E}, respectively.

Supplemental Tables
Each read is presented as a sequence of observed alleles at known SNP positions that overlap with the read.Moreover, each allele is presented as a sequence of position and nucleotide, e.g., (821,A).The reads in our example configuration are expressed as: Another useful way to present the reads that are associated with a certain homolog is a sequence of "occupation numbers": (n A , n B , . . ., n i , . . .n F ).
Here n i = 1 when read i (with i = A,B,. . .,F) originated from the considered homolog and n i = 0 when the read originated from another homolog.Continuing with the considered configuration, the "occupation numbers" for homologs f and g are shown in Table S6.
Based on this knowledge, we are in a position to derive the statistical model.There are 2 6 possible ways to distribute 6 reads between two homologs, and we assume that all possible configurations of reads are equally likely when sampling from a genomic library.
Given a particular configuration of reads, the probability of observing the two haplotypes that are formed by these combinations of reads is f (Reads 2 Homolog f ) ⇥ g(Reads 2 Homolog g).Thus, the probability of sampling n-reads from genomic library of a diploid organism is SNP position Genotypes of individual 1 Genotypes of individual 2 Genotypes of individual 3 Table S7.A population-specific reference panel of individual genotypes, sharing the same ancestral population as homolog f .The reference panel describes the genotypes of 3 individual at SNP positions, with the haplotype that was observed in reads A,B,D and F being highlighted.

SNP position Genotypes of individual 4 Genotypes of individual 5 Genotypes of individual 6
734 where n is the number of sampled reads, (Z2 Z2 is a n-fold Cartesian product of Z2.Thus, we sum over all the possible configurations of reads, using the "occupation-number" representation.In addition, XOR is the logical operation "Exclusive or", e.g, (0,1,0) XOR (1,1,1)=(1,0,1).The rationale behind the XOR operation is that every read that is not associated with homolog f is necessarily associated with homolog g.
We notice that when both homolog f and homolog g are associated with the same ancestral population, every term in the sum appears twice.Hence, we simplify the expression by summing over half of the terms and multiply the result by 2: where Z1 ⇥ (Z2) 2 = (0, a, b) | a 2 {0, 1} and b 2 {0, 1} , meaning that we sum over all the sequences of length n with the first element fixed to 0.
• The notation (f $ g) is used to represent the sum of all the other terms in the expression with f and g exchanged, e.g., • For distant-admixtures we define an effective frequency, f (x) ⌘ ↵1f1(x) + ↵2f2(x) + . . .+ ↵nfn(x), which takes into account that a haplotype have originated from one of several populations with probability ↵ i to originate from the ith population.
• Since all the reads that were drawn from the library of the monosomy originated from the same DNA molecule, we can easily generalize the models to any number of reads from the monosomy library by the substitution A ! A1, A2, . ... For example, the models for 2 + 1 reads is where we redefine g(BC . . .Z) ⌘ f (A1A2BC . . .Z).
Simulating a parental gamete and a diploid offspring in distant admixture scenarios.
First, we divide the genome into regions of 1 -10 Mbp.For simplicity we assume a distant admixture that involves only two ancestral populations, but it is straight forward to extend the generative model to any number of ancestral populations.We choose an ancestral makeup where %(100 • ↵) of the haplotypes are associated with population A, while % 100 • (1 ↵) of the haplotypes are associated with population B. Then, we draw 3 pairs of effective haploids with each pair composed of one haploid that is associated with population A and a second one that is associated with population B.
We consider the first pair and to each genomic region we assign either the effective haploid from population A with a probability ↵ or the effective haploid from population B with a probability 1 ↵.Then, we repeat the process for the two other pairs, so that three effective haploids would be assigned to each genomic region.In each region, the first two haploids would be used to simulate the diploid offspring, while the first and third haploids would be used to simulate the parental gamete and the unrelated gamete, respectively.Then, we simulate reads by selecting a random position along the chromosome from a uniform distribution, representing the midpoint of an aligned read with a given length.Based on the selected position, one out of the six haplotypes was drawn from a discrete distribution, where the probability of haplotype h depends on the position of the read, x.
In each genomic region either p i,A = 0 or p i,B = 0 with i = 1, 2, 3, and hence within each genomic region the discrete distribution reduces to When simulating a diploid offspring, the first haplotype is just as likely as the second haplotype, p2 = p1 and the third haplotype is absent, p3 = 0. Similarly, for the parental gamete p1 = 1 and p2 = p3 = 0, while for the unrelated gamete p3 = 1 and p1 = p2 = 0.Then, from the selected haplotype, h, a segment of length l that is centered at the selected chromosomal position, x, is added to simulated data, mimicking the process of short-read sequencing.This process of generating simulated sequencing data is repeated until the desired depth of coverage is attained.Table S9.We assign 3 haploids to each region in the genome.Each haploid is selected from a pair of haploids, where each haploid in the pair is associated with a different ancestral population.The probability to draw a haploid that is associated with population A is ↵, while it is 1 ↵ for population B. In the table we show a possible outcome of this procedure for 6 regions.Haploids that were drawn from the first two pairs are used to simulate a diploid offspring, while haploids that were drawn from the first and third pair are used to simulate the matched and unmatched gametes, respectively.
An example for calculating the score of a read Here we use the score metric that was introduced in the subsection "Prioritizing informative reads" to calculate the score of an example read, shown Figure S23.In this example, we consider the ancestral makeup: 30% population A and 70% population B. In addition, we assume that for each population we have a suitable reference panel.

Read
Reference genome a t c a g g c a t t t a t t g c a g g g c a c c a a g c a a (Top) A read that is aligned to a reference genome.(Bottom) A table of possible haplotypes in the chromosomal region that overlaps with the read, the frequencies of the haplotypes (as estimated from two reference panels), and the scores of the haplotypes.

Figure S2 .Figure S3 .
Figure S2.Signatures of various forms of chromosome abnormality with respect to their composition of genetically identical versus distinct parental homologs.Normal gametogenesis and fertilization produce a zygote with two genetically distinct copies of each chromosome-one copy from each parent.The vast majority of monosomies arise from maternal meiotic errors (occurring either during meiosis I or II), such that the zygote solely possesses the paternally inherited copy of the chromosome.Conversely, in the case of genome-wide uniparental isodisomy (GW-isoUPD), all homologous chromosomes are identical and are typically maternally inherited[33].Meiotic-origin trisomies may be diagnosed by the presence of one or more tracts with three distinct parental homologs (i.e., transmission of both parental homologs [BPH] from a given parent; indicated by black boxes).

Figure S4 .
Figure S4.Evaluating the sensitivity and specificity of LD-CHASE across populations based on simulation.Balanced ROC curves for simulations at varying depths of coverage, sample ancestries, and admixture scenarios, where the balanced true (false) positive rate is an average of the true (false) positive rate and the true (false) negative rate.Using phased genotypes from the 1000 Genomes Project, we simulated pairs of Monosomy 16 and Disomy 16, where half of the pairs had matched haplotypes and the other half had unmatched haplotypes.Then we divided Chromosome 16 into 45 bins (of ⇠ 2 Mbp).For each bin, we calculated a balanced ROC curve and then averaged the curves across bins by using a linear interpolation to obtain the balanced true positive rate that is associated with a fixed balanced false positive rate.All the simulated admixture involved equal proportions of ancestry from the component populations.Balanced ROC curves were calculated by varying the z-score threshold.

Figure S5 .
Figure S5.Ancestry inference from low-coverage PGT-A data.Genetic similarity between test samples and external reference samples informs the selection of ancestry-matched reference panels.Principal component axes were defined based on analysis of 1000 Genomes reference samples and colored according to superpopulation annotations (top row).Low-coverage embryo samples were then projected onto these axes (bottom row) using a Procrustes approach implemented with LASER (v2.0) [59], and genetic similarity to reference samples was determined using the k-nearest neighbors algorithm (k = 150) based on rectilinear distance on the top 32 principal components.For plotting purposes, we associated each test sample with up to two reference superpopulations if a given superpopulation comprised at least 15% of the nearest neighbors.

Figure S9 .
Figure S9.Validating the genomic distribution of maternal crossovers.We compared rates of maternal crossovers per bin inferred in our study to published data from deCODE[25].We defined the recombination rate as the number of crossovers in a genomic bin per sampled homolog and computed Pearson correlation coefficients (r ) between the studies.The number of bins per chromosome was selected to minimize the p-value (but not the correlation coefficient).The deCODE recombination map is treated as gold-standard, as maternal crossovers were inferred from 70, 086 samples[25].

Figure S10 .
Figure S10.Validating the genomic distribution of paternal crossovers.We compared rates of paternal crossovers per bin inferred in our study to published data from deCODE[25].We defined the recombination rate as the number of crossovers in a genomic bin per sampled homolog and computed Pearson correlation coefficients (r ) between the studies.The number of bins per chromosome was selected to minimize the p-value.The deCODE recombination map is treated as gold-standard, as paternal crossovers were inferred from 56, 321 samples[25].

Figure S11 .
Figure S11.Validating the assumed parental (i.e., sex-specific) origins of crossovers. A. Comparing rates of putative maternal crossovers (identified via comparison to haploid/GW-isoUPD embryos) per bin inferred in our study to published female-specific crossovers from deCODE.B. Comparing rates of putative maternal crossovers (identified via comparison to haploid / GW-isoUPD embryos) per bin inferred in our study to published male-specific crossovers from deCODE.The reduction in the correlation coefficient by 0.53 supports our assumption that the vast majority of haploid/GW-isoUPD embryos solely possess the maternally inherited genome, consistent with previous reports[33].

Figure S13 .
Figure S13.Evaluating the sensitivity and specificity of LD-PGTA[28]  along Chromosome 16 based on simulation. A. A generic balanced ROC curve, where the balanced true (false) positive rate is an average of the true (false) positive rate and the true (false) negative rate.B. A schematic describing our trinomial classification approach, which includes SPH, BPH, and ambiguous classes.Simulated samples with confidence intervals spanning zero were assigned to the ambiguous class.C. Using phased genotypes from the 1000 Genomes Project, we simulated samples with both parental homologs (BPH) and single parental homologs (SPH) configurations for Chromosome 16.Then we divided Chromosome 16 into 45 bins (of ⇠ 2 Mbp), and for each bin we calculated the area under the balanced ROC curve.Balanced ROC curves were calculated by varying the z-score threshold.

Figure S14 .
Figure S14.Evaluating the sensitivity and specificity of LD-PGTA[28]  across populations based on simulation.Balanced ROC curves for simulations at varying depths of coverage, sample ancestries, and admixture scenarios, where the balanced true (false) positive rate is an average of the true (false) positive rate and the true (false) negative rate.Using phased genotypes from the 1000 Genomes Project, we simulated samples with both parental homologs (BPH) and single parental homologs (SPH) configurations for Chromosome 16.Then we divided Chromosome 16 into 45 bins (of ⇠ 2 Mbp).For each bin, we calculated a balanced ROC curve and then averaged the curves across bins by using a linear interpolation to obtain the balanced true positive rate that is associated with a fixed balanced false positive rate.All the simulated admixture involved equal proportions of ancestry from the component populations.Balanced ROC curves were calculated by varying the z-score threshold.

Figure S15 .
Figure S15.Comparing the distributions of maternal crossovers between LD-CHASE and LD-PGTA.We calculated the Pearson correlation coefficient between rates of crossovers per genomic bin inferred by LD-PGTA versus LD-CHASE, stratifying across autosomal chromosomes.The number of bins was chosen to minimize the p-value (but not the correlation coefficient).

Figure S16
Figure S16.Differences in distributions of crossovers between disomic and trisomic samples across Chromosomes 1 through 6.We calculated the distribution of maternal crossovers in embryos with two distict homologs (disomy) as well as three distinct homologs (trisomy) for Chromosomes

Figure S19 .FigureFigure S21 .
Figure S19.Differences in distributions of crossovers between disomic and trisomic samples across Chromosomes 19 through 22.We calculated the distribution of maternal crossovers in embryos with two distict homologs (disomy) as well as three distinct homologs (trisomy) for Chromosomes 19-22 (20 bins per chromosome).Crossovers were identified as transitions between regions of "matched" and "unmatched" haplotypes and vice versa, where each region is classified with a z-score of the at least 1.96 and included at least 15 genomic windows for LD-CHASE or at least 30 genomic windows for LD-PGTA.

Figure
Figure S22.A possible configuration of 6 reads.In this configuration, reads A,B,D and F are associated with homolog f and reads C and E are associated with homolog g.

Table S1 .
Summarization of chromosome copy number abnormalities that were detected in the CReATe dataset.Number of whole-chromosome (left two columns) and segmental (right two columns) gains and losses per chromosome as detected by conventional coverage-based analysis.

Table S2 .
Stratification of trisomies by autosome and inferred source of error.Chromosome-wide patterns of SPH were designated as potentially mitotic in origin. 22.

Table S8 .
A population-specific reference panel of individual genotypes, sharing the same ancestral population as homolog g.The reference panel describes the genotypes of 3 individual at SNP positions, with the haplotype that was observed in reads C and E being highlighted.