Tetrad analysis without tetrad dissection: Meiotic recombination and genomic diversity in the yeast Komagataella phaffii (Pichia pastoris)

Komagataella phaffii is a yeast widely used in the pharmaceutical and biotechnology industries, and is one of the two species that were formerly called Pichia pastoris. However, almost all laboratory work on K. phaffii has been done on strains derived from a single natural isolate, CBS7435. There is little information about the genetic properties of K. phaffii or its sequence diversity. Genetic analysis is difficult because, although K. phaffii makes asci with four spores, the spores are small and tend to clump together, making the asci hard to dissect. Here, we sequenced the genomes of all the known isolates of this species, and find that K. phaffii has only been isolated from nature four times. We analyzed the meiotic recombination landscape in a cross between auxotrophically marked strains derived from two isolates that differ at 44,000 single nucleotide polymorphism sites. We conducted tetrad analysis by making use of the property that haploids of this species do not mate in rich media, which enabled us to isolate and sequence the four types of haploid cell that are present in the colony that forms when a tetratype ascus germinates. We found that approximately 25 crossovers occur per meiosis, which is 3.5 times fewer than in Saccharomyces cerevisiae. Recombination is suppressed, and genetic diversity among natural isolates is low, in a region around centromeres that is much larger than the centromeres themselves. Our method of tetrad analysis without tetrad dissection will be applicable to other species whose spores do not mate spontaneously after germination. Author summary To better understand the basic genetics of the budding yeast Komagataella phaffii, which has many applications in biotechnology, we investigated its genetic diversity and its meiotic recombination landscape. We made a genetic cross between strains derived from two natural isolates, and developed a method for characterizing the genomes of the four spores resulting from meiosis, which were previously impossible to isolate. We found that K. phaffii has a lower recombination rate than Saccharomyces cerevisiae. It shows a large zone of suppressed recombination around its centromeres, which may be due to the structural differences between centromeres in K. phaffii and S. cerevisiae.


121
Nucleotide diversity in natural isolates of K. phaffii 122 We obtained five isolates of K. phaffii from the NRRL culture collection and sequenced their 123 genomes. For convenience we refer to them as Pp1 -Pp5; their strain numbers in the NRRL 124 and CBS culture collections are given in Table 1. As far as we know, no other isolates of this 125 species are available from public culture collections apart from CBS7435 [4,32]. We mapped 126 the Illumina reads from each strain to the CBS7435 reference genome sequence [33] using 127 BWA, and identified variant sites using the GATK SNP (single nucleotide polymorphism) 128 calling pipeline. We included strain GS115 in this analysis, using the genome sequence 129 reported by Love et al. [5]. A total of 64,019 variable SNP sites were identified among the six 130 strains relative to CBS7435 (Table 1), and a phylogenetic tree of the strains was constructed 131 from the variable sites (Fig 2).

133
Strains Pp1 and Pp3 were found to be essentially identical to CBS7435, differing from it by 134 only 3 or 4 nucleotides in the 9.4 Mb nuclear genome. Pp3 (NRRL Y-7556 / CBS2612) is a 135 natural isolate from exudate of a black oak tree, and was designated by Kurtzman [32] as the 136 type strain of K. phaffii. Pp1 and CBS7435, which were both deposited in culture collections 137 by petroleum researchers (Table 1), seem to be duplicate accessions of this natural isolate. 138 The mutagenized strain GS115 differs from its parent CBS7435 at 69 sites [5].  (Table 1). Pp2 and Pp4 are also quite divergent from each other (Fig 2). The density of 143 SNPs, at 1.66 -4.66 SNPs per kb, is lower than the density seen in wild isolates of 144 S. cerevisiae (e.g., the average nucleotide diversity among wild isolates from China is 8.08 145 differences/kb; [34]). Recovery of haploid segregants 149 We selected two of the most divergent strains from the phylogenetic tree, namely GS115 and 150 Pp4, and crossed them in order to examine the meiotic recombination landscape in K. phaffii.

151
GS115 has a his4 mutation making it auxotrophic for histidine. We made a derivative of Pp4 152 that is auxotrophic for arginine (Pp4Arg -) by replacing the native ARG4 gene with a ble 153 cassette conferring resistance to zeocin (Zeo R ). The HIS4 and ARG4 genes are both located Diploid cells generated from the GS115 x Pp4Argcross were identified by selecting for 158 growth on media lacking His and Arg. Good mating was observed after incubating the cross 159 for 3 days on diploid selection medium, resulting in a confluent patch of cells at the junction 160 of streaked parental strains (S1 Fig). The diploid strain was sporulated and tetrad formation 161 was confirmed by microscopy, but we were unable to dissect the tetrads using a Singer 162 Sporeplay dissection microscope. Because K. phaffii cells do not mate on rich (YPD) media, we reasoned that if an ascus is 165 placed intact on YPD, so that its four spores germinate in situ, the resulting colony should 166 contain a mixture of four different types of haploid cell that are the mitotic descendants of the 167 four spores (Fig 3). The four types of cell can be isolated simply by streaking out the original 168 ascus-derived (mixed) colony, so that single cells initiate new colonies, each of which will 169 have a homogeneous genotype that can be identified by replica plating onto appropriate 170 media. Because our diploid was a double heterozygote (HIS4/his4 ARG4/arg4), it should 171 produce three types of asci depending on how these markers segregate: parental ditypes 172 (PD), non-parental ditypes (NPD), and tetratypes. In tetratype asci, which are formed if a 173 single crossover occurs between the HIS4 and ARG4 loci, each spore has a different 174 genotype and all four possible combinations of the two markers are present. Therefore, a 175 tetratype ascus should produce four phenotypes after colonies are streaked out (His + Arg + ,

176
His -Arg + , His + Arg -, and His -Arg -), so we can identify colonies that are mitotic descendants of 177 each of the four spores by replica plating onto appropriate media (Fig 3). In contrast, PD and 178 NPD asci both produce only two colony phenotypes, so they cannot be used to identify 179 descendants of all four spores.

181
Following this logic, we isolated four-spored asci from the sporulated culture using the 182 micromanipulator, and placed them, without dissection, onto YPD agar so that each ascus 183 germinated into a colony. We then streaked out these colonies to obtain new colonies 184 initiated by single cells, patched the new colonies onto fresh YPD, and replica-plated them to 185 assess their His and Arg phenotypes, looking for tetratypes. In this manner, we successfully 186 recovered the four meiotic products from five tetratype asci (S1 Fig). Six other asci yielded 187 three of the four expected phenotypes ('trio' asci), but we were unable to recover the fourth 188 phenotype even after screening approximately 70 colonies from each of these asci. The 189 absence of the last genotype in the trios is possibly due to epistatic interactions between loci 190 from the Pp4 and GS115 genetic backgrounds, in the particular combinations that were 191 formed in some spores, resulting in the failure of one spore to germinate. High-resolution mapping of meiotic recombination in K. phaffii 195 We sequenced the genomes of 38 segregants: 4 segregants from each of 5 tetrads, and 3 196 segregants from each of 6 trios. Each genome was sequenced to approximately 100x 197 Illumina coverage, and the reads were used to genotype each segregant at every SNP site 198 between the Pp4 and CBS7435 reference genome (which is almost identical to GS115; see 199 Methods). Data analysis was carried out on 43,708 SNP markers. The median distance 200 between consecutive markers in our cross is 96 bp, which is comparable to the S. cerevisiae We identified a total of 280 crossovers from the 11 meioses (Table 2; S3 Data). The mean 215 number of crossovers per meiosis is 25.5, which is 3.5 times lower than the average in 216 S. cerevisiae (90.5; [23]). The number varies about threefold (from 11 to 37) among the asci, 217 with no systematic difference between tetrads and trios. The number of crossovers per 218 meiosis in K. phaffii is also lower than in S. paradoxus (54.8), but similar to Sch. pombe 219 (26.6) and L. kluyveri (19.9) [25, 27, 35]. Crossover frequency correlates with chromosome size 223 We found on average one crossover per 369 kb across the genome, a number that is 224 consistent among the four chromosomes ( Table 2). The average number of crossovers per 225 meiosis on each chromosome in our data has a linear correlation with chromosome size (Fig   226   5), in agreement with the pattern seen in a variety of other fungal genomes [22,25,36,37].

227
The trend line for K. phaffii has an intercept of 1.37 crossovers, in agreement with the 228 8 occurrence one obligatory crossover per chromosome, which follows from the essential role 229 that crossovers play in chromosome segregation. Every chromosome sustained at least one 230 crossover in every meiosis (Table 2), except for chromosome 4 in Trio 19 for which all spores 231 were derived from the Pp4 parent (i.e. 3:1 or 4:0 segregation) and thus did not have any 232 crossovers.  (Table 2; S3 Data). The ratio between 247 crossovers and NCOs in K. phaffii is 1.3, which is lower than the previously reported ratios of 248 2.0 in both S. cerevisiae and S. paradoxus [23,27], and 3.2 in L. kluyveri [25].

250
Examining the 110 crossovers in our five complete tetrads (Table 2)   In L. kluyveri, Brion et al. [25] observed a high frequency of 4:0 segregation and 'double 265 crossovers' (two crossovers at the same site, on two pairs of chromatids in a tetrad). They 266 interpreted this pattern as as evidence of 'return to mitotic growth' (RTG), a process in which 267 meiosis is initiated, abandoned, and then re-initiated after some rounds of mitotic division 268 [25]. In contrast, we did not observe any evidence of double crossovers in our K. phaffii data, 269 and found only one region with 4:0 segregation. There were two potential double crossover 270 sites (Tetrad 1, chr. 1 at 781604-784124; and Tetrad 20, chr. 4 at 1125099-1131625), but 271 after further examination we found each of them to be two independent single crossovers 272 that were very close together. The Tetrad 1 site includes a region of 346 bp containing 9 273 SNP sites that segregate in a 4:0 pattern, probably due to the overlap of two adjacent 3:1 274 conversion tracts caused by two close but independent crossovers (Fig 4). We also observed 275 four sites in which a crossover between two chromatids coincided with a patch of gene 276 conversion on a third chromatid, a situation called 'Type II tetrads' by Liu et al. [27]. We 277 scored these sites as both a crossover and an NCO (S3 Data). Our experimental design made a crossover in the interval between the ARG4 and HIS4 282 markers on chromosome 1 obligatory in each of the 11 asci. In addition to these 11 283 obligatory crossovers, there was also a second crossover in this interval in two asci, involving 284 a different pair of chromatids for each crossover. Looking at the locations of the 13 285 crossovers, we see that they are not randomly distributed along the 421-kb interval between 286 ARG4 and HIS4, which includes the centromere (CEN1). Instead, all the crossovers are 287 close to HIS4 and they seem to avoid the centromere region (Fig 6A). The hypothesis that 288 crossovers are distributed uniformly in the interval is rejected by a statistical test 289 (Kolmogorov-Smirnov test; P = 1.6e-6). The pattern suggests that crossing over near the 290 centromere is suppressed.

292
To investigate whether other crossovers are also suppressed near centromeres, we plotted 293 the locations of all 269 non-obligatory crossovers in our dataset as a function of their 294 distance from centromeres, both for the whole genome (Fig 6B), and for each chromosome To examine this pattern in more detail, we plotted SNP diversity in the three natural isolates 314 Pp4, Pp2 and Pp5, in 1-kb bins, relative to the CBS7435 reference genome (Fig 7). Pp4 and  There are many protein-coding genes in the low-diversity regions, including ARG4, and the 332 gene density is the same as in the rest of the genome. All the crossovers in the ARG4-HIS4 333 interval occurred outside the low-diversity region around CEN1 (Fig 6A). The NRRL strains of K. phaffii used in this study (Table 1)    The cross of parental strains GS115 and Pp4Argwas carried out by making parallel streaks 427 of the two parental strains on YPD, and then velvet replica plating these streaks onto a 428 mating plate twice at right angles so that they intersected as a grid [10]. The mating plate 429 contained NaKG agar media (0.5% sodium acetate, 1% potassium chloride, 1% glucose, 2% 430 bacto agar) plates supplemented with L-histidine (50 mg l -1 ) and L-arginine (100 mg l -1 ) 431 (NaKG +His +Arg). The mating plate was incubated for 2 days at room temperature and 432 subsequently replica plated onto diploid selection medium (SC -His -Arg: SD medium 433 supplemented with -Arg/-His drop-out mix). Diploids were incubated for 3 days at 30 °C, and 434 streaked for a second time for phenotype confirmation and generation of single colonies. to identify the 4 (or 3) meiotic products originating from tetratype asci. Asci that yielded only 454 two phenotypes were considered to be PD or NPD and discarded. It is important to note that 455 K. phaffii spores tend to clump together, which can be noticed by the high number of colonies 456 showing prototrophic phenotypes (S1 Fig; [10]). Therefore segregants from all candidate 457 tetratype asci were re-patched onto all diagnostic media to confirm their phenotypes before 458 genome sequencing.  [5] were removed from these lists -they are an artefact of our decision to use 481 CBS7435 as the reference genome for alignment (because it had better assembly and 482 annotation), whereas GS115 was the actual parent in our cross.

484
Crossover detection 485 A bespoke Java program was used, taking the joint genotype calls as input, to list the 486 genotype of the 4 segregants in each tetrad (or 3 segregants in each trio) at every SNP site 487 (S1 Data). Genotypes of segregants at SNP sites were coded as 0 (for a nucleotide matching 488 GS115) or 1 (for a nucleotide matching Pp4). Sites called as indels or heterozygous were 489 discarded. Genotype lists for each tetrad were input into the plotting tool in the ReCombine  n/a n/a n/a n/a n/a n/a NRRL, Northern Regional Research Laboratory, US Department of Agriculture. CBS, Centraalbureau voor Schimmelcultures (Westerdijk 529 Institute), The Netherlands.