Sexual dimorphism and the effect of wild introgressions on recombination in Manihot esculenta

Recombination has essential functions in evolution, meiosis, and breeding. Here, we use the multi-generational pedigree, consisting of 7,165 informative meioses (3,679 female; 3,486 male), and genotyping-by-sequencing (GBS) data from the International Institute of Tropical Agriculture (IITA) to study recombination in cassava (Manihot esculenta). We detected recombination events using SHAPEIT2 and duoHMM, examined the recombination landscape across the 18 chromosomes of cassava and in regions with known introgressed segments from cassava’s wild relative Manihot glaziovii, constructed a genetic map and compared it to an existing map constructed by the International Cassava Genetic Map Consortium (ICGMC), and inspected patterns of recombination placement in male and female meioses to see if there is evidence of sexual dimorphism in crossover distribution and frequency. We found that the placement of crossovers along chromosomes did not vary between the two sexes but that females undergo more meiotic recombination than males. We also observed that introgressions from M. glaziovii decreased recombination in the introgressed region and, in the case of chromosome 4, along the entire length of the chromosome that the introgression is on. We observed a dosage effect on chromosome 1, possibly suggesting the presence of a variant on the M. glaziovii haplotype that leads to lower overall recombination in the introgressed region.

frequency. We found that the placement of crossovers along chromosomes did not 54 vary between the two sexes but that females undergo more meiotic recombination 55 than males. We also observed that introgressions from M. glaziovii decreased 56 recombination in the introgressed region and, in the case of chromosome 4, along 57 the entire length of the chromosome that the introgression is on. We observed a 58 dosage effect on chromosome 1, possibly suggesting the presence of a variant on the 59 M. glaziovii haplotype that leads to lower overall recombination in the introgressed 60 region. 61

INTRODUCTION
and three produced ambiguous results. We excluded these 13 TMS13 and TMS14 153 individuals from further analyses. Table 1   AlphaAssign calculates the posterior probability of these four relations, given the 175 observed allelic depth data of c and t (and if known, the allelic depth data for a 176 known parent of t; if individual t has no known parent, the algorithm makes use of a 177 'dummy parent' whose genotype probabilities at a given site are calculated using 178 estimated allele frequencies and assuming Hardy-Weinberg Equilibrium). 179 Because AlphaAssign looks at the relationship between pairs of individuals 180 rather than among triplets, we needed to run AlphaAssign two times to validate 181 IITA's pedigree information. We walk through the validation procedure for GG 182 individuals. In the first run, we provided the algorithm no pedigree information (i.e., 183 all calculations involved the use of a dummy parent). For each target individual, we 184 listed all GG individuals as candidate parents (we did not list an individual as its 185 own candidate parent). We fed the algorithm allelic depth data from 1,000 randomly 186 sampled sites across cassava's 18 chromosomes. To filter for LD, we sampled sites 187 such that no two sites fell within 20 kb from one another. For each target individual, 188 we identified the candidate individual with the highest score statistic and listed this 189  We included IITA-TMS-IBA062021 in the analysis when phasing and imputing. 220 Removal of this duo is inconsequential since this duo provides an uninformative 221 meiosis (see the section "Filtering the SHAPEIT2-duoHMM output" below for a 222 discussion of informative meioses). We then removed sites with a mean depth 223 (calculated across all samples) greater than 120 to avoid spurious genotype calls 224 within repeat regions, i.e., paralogs. 225 226 Generating input data files for SHAPEIT2 227 SHAPEIT2 takes called genotypes as input. To obtain a set of called genotypes 228 for our sample, we first calculated genotype posterior probabilities for each 229 individual at each site. Given observed data ! (!) and fixed sequencing error rate e = 0.01, we computed the likelihood for genotype ! (!) = . We calculated genotype 231 likelihoods for a single individual at a single site, independent of all other 232 individuals and sites in the sample, using the following equation: 233 We called genotypes from these estimated posterior genotype probabilities, calling a 251 genotype for individual d at site v only if one of the three possible genotypes had a 252 posterior probability greater than or equal to 0.99. To qualitatively examine how 253 SHAPEIT2 performs at different levels of missing data, we generated two datasets: 254 one dataset where we removed sites that had more than 20% missing data and 255 another where we removed sites that had more than 30% missing data We 256 observed that when more markers are retained, SHAPEIT2-duoHMM detected a 257 larger number of crossovers but crossover intervals were longer (a recombination 258 event can only be resolved down to the region between its two flanking 259 heterozygous markers in the parent). Results from the 20% dataset were very noisy, 260 so we selected the 30% dataset to analyze. Table 3 shows the number of sites after 261 applying the 30% maximum-missing filter for each chromosome. Supplementary 262 Figure 1 shows the plots for each chromosome's duoHMM-inferred crossover 263 intervals for the 20% and 30% maximum-missing datasets. 264 265  1  30575  17159  3739  2  24151  14818  2265  3  22156  14493  2458  4  18271  9980  2519  5  20750  14106  1681  6  20174  12907  1609  7  12226  7163  1114   8   17991  11549  1689  9  18267  12194  1341  10  14985  8206  1573  11  18816  11267  1777  12  15906  9921  1703  13  15851  9969  2018  14  20635  11957  2939  15  20048  13239  1705  16  15447  9832  1267  17  15390  8716  2206  18 15053 9063 1524 270

Running SHAPEIT2 and duoHMM 271
We used SHAPEIT2 and duoHMM to detect SNP intervals flanking a crossover 272 event (a recombination event can only be resolved down to the region between its 273 two flanking heterozygous markers in the parent). We To run SHAPEIT2 with the 'duoHMM' flag, we provided the algorithm with a 290 set of genotypes, a genetic map, and verified pedigree information. SHAPEIT2 291 outputs either a single set of estimated (most-likely) haplotypes or a haplotype 292 graph that encapsulates the uncertainty about the underlying haplotypes. We chose 293 the latter output. SHAPEIT2 has multi-threading capabilities, but we chose not to 294 use this feature in order to maximize the number of individuals that SHAPEIT2 295 conditions on during Gibbs sampling. We ran SHAPEIT2 with 14 burn-in iterations, 296 16 pruning iterations, and 40 main iterations. We increased the number of 297 conditioning states to 200 states per SNP. The developers found it slightly 298 advantageous to use a window size larger than 2 Mb when large amounts of 299 identical by descent (IBD) sharing are present. We used a window size of 5 Mb. We 300 provided SHAPEIT2 a genetic map that specifies the recombination rate between 301 SNPs. We generated this genetic map by interpolating genetic distances of GBS 302 markers using ICGMC's composite genetic map. We used the default value of 15,000 303 for the effective population size, a parameter that scales the recombination rates 304 that SHAPEIT2 uses to model patterns of LD. 305 306

Detecting recombination events using duoHMM 307
Once duoHMM corrected SEs in the SHAPEIT2-inferred haplotypes, we reran 308 duoHMM to infer recombination events. The HMM infers recombination events by calculating the probability of a recombination event between markers [11]. To 310 detect crossovers, we sampled a haplotype pair for each individual from SHAPEIT2's 311 diploid graph then calculated the probability of a recombination event between 312 pairs of markers. We repeated this process a total of 10 times then averaged the 313 inter-SNP recombination probabilities across the 10 iterations. We included a 314 crossover interval in subsequent analyses if the interval had a probability greater 315 than or equal to t = 0.5. Supplementary Figure 1 shows those crossover intervals 316 with probabilities greater than or equal to t = 0.9. 317 318

Filtering the SHAPEIT2-duoHMM output 319
The power to detect recombination events is dependent on the structure of 320 the pedigree. In a nuclear family with >2 offspring, most crossover events should be 321 detectable, and we classify these pedigrees as informative towards recombination. 322 We analyzed data from only those pedigrees having "informative" meioses, which 323 we defined as a nuclear family consisting of >2 offspring or a pedigree consisting of 324 three generations. We refer to the parents of these pedigrees as "informative 325 parents" and the meioses in these pedigrees as "informative meioses". Of the total 326 8,678 meioses in the data set, 7,165 were informative (3,679 female meioses; 3,486 327 male meioses). 328 329

Building sex-averaged genetic maps
To build a genetic map for each chromosome, we first calculated the number 331 of crossover events that occurred between each pair of SNPs. If a crossover event 332 spanned multiple SNP intervals, we assigned a fraction of the crossover event to 333 each of the spanned intervals, calculated as 1/(length of the SNP interval in bps). We window varied between the sexes, we performed a chi-square test of equal counts in 346 each window. To calculate the expected number of male crossovers in a given 347 window, we calculated the proportion of total meioses analyzed that were male (i.e., 348 3,486/(3,679 + 3,486)) then multiplied this value by the total number of crossovers 349 in the window. We calculated the expected number of female crossovers in a given 350 window in the same way. We did not test for statistical significance in the last 351 window of any chromosome since the last window is shorter than 1-Mb (no 352 chromosome is perfectly divisible by 1-Mb). We could not perform the chi-square test for four of the 510 windows because these windows had one or more classes 354 with an expected frequency count of less than five. We tested each window at a 355 Bonferroni-corrected significance level of α/m, where α = 0.05 and m = 506 (i.e., the 356 total number of windows tested). We also performed this test genome-wide at a 357 significance level of 0.05. 358 359

Examining if crossover placements are random and independent events 360
If crossover placements are random and independent events, the distribution 361 of the number of crossovers observed on a given chromosome in a given parent-362 offspring pair is expected to follow a Poisson distribution. We used the deviance 363 goodness of fit test to determine if crossover placements are Poisson distributed. 364 For each chromosome, we performed a Poisson regression where we modeled the 365 number of crossovers observed in a given parent-offspring pair Y as a function of 366 the covariates "parent" and "sex". The "parent" covariate specifies the parent 367 involved in the parent-offspring pair, and the "sex" covariate specifies whether the 368 parent was a female or male (i.e., were the crossovers observed in a male or female 369 meiosis). We used the residual deviance to perform a chi-square goodness of fit test window varied between the sexes, we performed a chi-square test of equal counts in 510 each window. We did not test for statistical significance in the last window of any 511 chromosome since the last window is shorter than 1-Mb. Of the 506 intervals tested, 512 45 (8.9%) passed the significance threshold. In these 45 intervals, female crossover 513 count was significantly higher than expected and male crossover count was 514 significantly lower than expected, a pattern observed in other taxa [18]. Statistically 515 significant intervals did not consistently appear in any specific region of the 516 chromosomes ( Supplementary Fig 3). 517 518 519 Figure 2 Distribution of crossover events across chromosome 1 for female 520 meioses and male meioses. 521 We divided each chromosome into 1-Mb windows and plotted the number of 522 crossovers falling within each interval for female (red), and male (blue) meioses. 523 Solid lines represent observed counts. Dashed lines represent expected counts, 524 assuming that crossing over occurs with equal frequency in females and males. 525 Asterisks show intervals with significantly different crossover frequency between 526 male and female meioses. Dashes represent cases where we could not perform the 527 chi-square test because the expected frequency count for one or more classes was 528 less than five. We did not test for statistical significance in the last window of any 529 chromosome since the last window is shorter than 1-Mb (no chromosome is 530 perfectly divisible by 1-Mb). The centromere of the chromosome is shown in purple. 531 The region with detected introgressions from M. Glaziovii is shown in red. We tested 532 each interval at a significance level of α/n, where α = 0.05 and n = 506. 533 534  (Figure 1). We observed that introgressions