Investigating expression of a human optimized cas9 transgene in Neurospora crassa

The CRISPR-associated Cas9 enzyme is used in molecular biology to engineer the genomes of a wide range of organisms. While Cas9 can be injected or transfected into a target cell to achieve the desired goal, there are situations where stable expression of Cas9 within a target organism is preferable. Here, we show that the model filamentous fungus Neurospora crassa is recalcitrant to heterologous expression of a human-optimized version of Streptococcus pyogenes cas9. Furthermore, partial optimization of cas9 by synonymous codon exchange failed to improve its expression in the fungus. Finally, we show that transgene expression can be detected when cas9Hs sequences are placed in the 3’ UTR regions of transgene-derived mRNAs, but not when the same sequences are in the translated part of the transgene-derived mRNA. This finding suggests that the primary obstacle to high cas9Hs expression levels in N. crassa is translational in nature.


INTRODUCTION
The availability of CRISPR-associated (Cas) systems for use in genome engineering has accelerated research on organisms that were historically difficult to engineer with traditional techniques (Knott and Doudna 2018;Wang et al. 2020;Zhu et al. 2020). Cas technology has also accelerated the field of gene driver research, where it offers the possibility of engineering synthetic gene drivers for the control of disease spreading organisms (Esvelt et al. 2014). For these reasons, we sought to establish a robust Streptococcus pyogenes Cas9-based system for use in the model filamentous fungus Neurospora crassa.

Strains and media
Strains are listed in Table 1. Vogel's minimal medium (VMM) (Vogel 1956) with and without 2% agar was used for vegetative propagation of all strains. L-histidine was added to VMM at 500 mg / L when needed to support growth of his-3 strains. Hygromycin B (GoldBio, was included in media at concentrations of 200 μg / ml to select for hygromycin-resistant transformants. Crosses were performed on synthetic crossing medium (pH 6.5) with 1.5 % sucrose (Westergaard and Mitchell 1947).

N. crassa transformation
Transformation of N. crassa was performed as described by Rhoades et al. (2019b), except that the recovery step was omitted when transformants were selected for histidine prototrophy. N. crassa strain P8-42 was transformed with pAH41.4, a plasmid containing tef-1(p)-cas9 Hs .
Transformants were selected for resistance to hygromycin B. Plasmids pNR206.2 and pNR207.1 contain gfp-cas9 NcC213 and gfp-cas9 NcC60 transgenes, respectively, and these plasmids were used to transform strain P8-43. Transformants were selected for histidine prototrophy. Homokaryotic strains were isolated from transformants by single spore purification (from conidium or ascospores) and confirmed to be homokaryotic with polymerase chain reaction (PCR)-based genotyping assays.

Transgene construction and insertion
tef-1(P)-cas9 Hs : A tef-1(P)-cas9 Hs -containing plasmid (pAH41.4) was constructed by amplifying the N. crassa tef-1 promoter region (tef-1[P]) from N. crassa genomic DNA with primers 598 and 599 by PCR and inserting the PCR product into the SpeI site of Plasmid 43802 (Addgene, DiCarlo et al. 2013). The resulting plasmid was used as the template for PCR-based amplification of tef-1(P)-cas9 Hs with primers 610 and 589. The PCR product was inserted into the NotI site of the his-3-targeting plasmid pTH1150.11 (GenBank MN872812.1) to produce plasmid pAH41.4. Primer sequences are listed in Table S1. Sequences of N. crassa genes, such as tef-1 and the tef-1 promoter region, were obtained from FungiDB (Stajich et al. 2012).
gfp-cas9 Hs : The gfp-cas9 Hs and gfp-stop-cas9 Hs transgenes, including all truncated versions (e.g., gfp-cas9 HsC1209 , gfp-stop-cas9 HsC1078 , etc.) were constructed by double-joint PCR (DJ-PCR) (Hammond et al. 2011) with primers and templates described in Tables S1 and S2. gfp-cas9 NcC213 : Primers 2021 and 2022 were used to amplify a single PCR product containing hph, ccg-1(P) (McNally and Free 1988), gfp, and a (GA)5-linker from plasmid pTH1117.12 (GenBank JF749202.1). Primers 1980 and 624 were used to amplify a single PCR product containing cas9 Nc213 from pNR177.2 (described below). The two products were fused by PCR, and the fusion product was used as a template for amplification with primers 2024 and 2025. The amplified product was digested with NotI and cloned into the NotI site of the his-3targeting plasmid pTH1150.11 to create plasmid pNR206.2.
gfp-cas9 NcC60 : The gfp-cas9 NcC60 transgene was assembled similarly to the gfp-cas9 NcC213 transgene except that primer 1980 was exchanged for primer 1983. The final amplification product was cloned into the NotI site of pTH1150.11 to create plasmid pNR207.1.

Codon optimization
We used a publicly available RNAseq dataset (SRR1055991, Wu et al. 2014) to identify 100 of the most highly expressed protein-coding genes in the N. crassa nuclear genome (Table S3) by aligning RNA sequences to all predicted N. crassa protein coding genes as described in Samarajeewa et al. (2017). Genes with the highest "reads per kilobase exon model" (RPKM) values (Mortazavi et al. 2008) were considered to be the most highly expressed. Relative adaptiveness (RA) values (Sharp and Li 1987) were then calculated for each codon (Table S4). RA values were calculated by dividing the observed frequency of each codon in the set of 100 highly expressed genes by the frequency of the most common synonymous codon in the same set of genes. Codon optimization of cas9 for expression in N. crassa was then performed by replacing cas9 Hs codons in cas9 Hs as follows: 1) the synonymous codon with the highest RA value was used for all occurrences of amino acids C, E, F, H, I, K, L, M, N, Q, W, and Y, as well as the stop codon; 2) the synonymous codon with the second highest RA value was used for all occurrences of A, D, G, P, and R; and, 3) a combination of synonymous codons with either the first or second highest RA values was used for amino acids S, T, and V. Use of the second most optimal codon for some amino acids helped reduce the overall GC content of the optimized sequence. Reducing GC content allowed the optimized sequence to be synthesized as a gBlock® (Integrated DNA technologies). A gBlock® DNA fragment containing the optimized cas9 Nc sequence ( Figure S1) was cloned to pJET1.2 to create pNR177.2.

Visualization of GFP in conidia by fluorescence microscopy
Fluorescence microscopy and imaging was performed with a Leica DMBRE microscope or a Leica SP8 confocal microscope. For imaging with a Leica DMBRE system, N. crassa cultures were incubated on VMM for 1-2 days at 32°C followed by 2-4 days at room temperature.
Conidial suspensions were placed on a standard microscope slide for imaging. A 40× objective was used. GFP-signal was collected with a 20 second exposure for all strains. Cropped raw images were used without additional modifications. For imaging with the Leica SP8 confocal microscope, the same growth conditions were followed as stated for imaging with the Leica DMBRE system. Conidial suspensions were prepared, and then incubated in a 37°C shaker for 2 hours before imaging. Conidial suspensions were placed on a standard microscope slide for imaging. All the images were acquired with the same settings so that they could be directly compared. A 63×/1.40 oil objective was used, with a white light laser set to a wavelength of 488 nm. Fluorescence was detected between the wavelengths of 500-560nm, with 8× line averaging.
Images were assembled in Photoshop; GFP is shown with original contrast.

Quantification of GFP in conidia by flow cytometry
Fresh N. crassa cultures were prepared by qualitative transfer of conidia to 2.5 ml of solid VMM slants in 16×100 mm glass culture tubes. The inoculated culture tubes were incubated for two days in a 32C incubator and 5-6 days at room temperature on a laboratory bench top. Sterile wood applicators were used to transfer conidia to 1× PBS (137 mM NaCl, 2,7 mM KCl, 10mM Na2HPO4, 1.8 mM KH2PO4, pH 7.4) ( Figure S2). Conidial suspensions were then analyzed with a BD FACSMelody system (equipped with 488 nm, 561 nm, and 640 nm lasers) and FACSChorus software. Conidia were gated on FSC and SSC and MFI of GFP histograms were analyzed using a 1.5 neutral density filter. Raw MFI values are provided in Table S5.

A cas9 Hs transgene is poorly expressed in N. crassa
We began our studies by constructing a cas9 Hs transgene (Fig. 1A) and inserting it downstream of his-3 on chromosome I (Fig. 1B). We then constructed a second transgene, gfp-cas9 Hs , to express a GFP-Cas9 fusion protein ( Fig. 1, C-E). This second transgene was constructed to allow us to determine if Cas9, which includes an SV40 nuclear localization signal on its Cterminus, localizes to the N. crassa nucleus. However, when we examined the gfp-cas9 Hs strain by fluorescence microscopy, we failed to detect a GFP signal (Fig 2, A-B). Because we focused our assays on N. crassa conidia (asexual spores), we considered the possibility that the promoter used to drive expression of gfp-cas9 Hs (ccg-1[P]) was insufficient for expression of the fusion protein in this cell type. Thus, as a control, we examined conidia from a gfp-sad-6 transgenecarrying strain. SAD-6 is a meiotic silencing by unpaired DNA (MSUD) protein (Samarajeewa et al. 2014) and it was chosen as a control for Cas9 expression because it is larger than Cas9 (210 kD vs 159 kD) and because, like the gfp-cas9 Hs transgene, the gfp-sad-6 transgene is driven by the ccg-1 promoter. In contrast to GFP-Cas9, GFP-SAD-6 was easily detected by fluorescence microscopy ( Figure S3, compare A and B). These observations suggest that the ccg-1 promoter should be sufficient for expression of GFP-Cas9 in N. crassa conidia.

Partial codon optimization of cas9 for expression in N. crassa
The above results suggest that cas9 Hs sequences are poorly expressed in N. crassa. To identify a segment of cas9 Hs that could be expressed more robustly in the organism, and concomitantly identify regions of cas9 Hs that prevent its robust expression, we dissected the cas9 Hs coding region with a series of gfp-cas9 HsC# transgenes (Fig. 1D). "C#" specifies the number of Cas9 amino acids in the GFP-tagged and N-terminally truncated Cas9 protein (for example, cas9 HsC60 encodes only the C-terminal 60 amino acids of Cas9. The number "60" does not include the seven amino acids of the C-terminal NLS). We constructed 16 of these gfp-cas9 HsC# transgenes and examined GFP levels in conidia from the gfp-cas9 HsC# transgene-carrying strains by flow cytometry. Interestingly, we found that all transgenes containing more than 212 cas9 Hs codons failed to produce a GFP signal at detectable levels and that GFP levels increased rapidly as the number of cas9 Hs codons approached zero ( We considered the possibility that robust expression of cas9 Hs in N. crassa is restricted by the organism's synonymous codon biases. To test this hypothesis, we altered the cas9 coding sequence in the gfp-cas9 HsC213 and gfp-cas9 HsC60 transgenes to produce equivalent gfp-cas9 NcC213 and gfp-cas9 NcC60 transgenes ( Figure S3). Specifically, we replaced codons that are found at low frequency in highly expressed N. crassa mRNAs with synonymous codons that are found at high frequency in N. crassa mRNAs (see methods). Interestingly, despite replacing 120 of 220 codons to make cas9 NcC213 (213 cas9 Hs codons, 7 nls codons, and 1 stop codon) and replacing 40 of 68 codons to make cas9 NcC60 (60 cas9 Hs codons, 7 nls codons, and 1 stop codon), neither transgene produced more GFP fusion protein than their human-optimized counterparts (Fig. 4).

Translation of cas9 Hs sequences negatively influences expression of gfp-cas9 Hs transgenes
The tendency of GFP levels to increase as the length of cas9 Hs -coding sequence decreases suggests that low GFP levels are due to problems encountered during translation of cas9 Hs sequences by N. crassa ribosomes. To test this hypothesis, we constructed a set of gfp-stop-cas9 HsC# transgenes. These transgenes are identical to the gfp-cas9 HsC# transgenes except that they contain a stop codon between the gfp and cas9 Hs coding sequences. Interestingly, we found that the placement of a stop codon between the gfp and cas9 Hs coding sequences increased the level of GFP produced by all transgenes, but not in a completely uniform manner ( Figure 3B and Table 3). For example, GFP levels appeared to increase around two maxima, one represented by the gfp-stop-cas9 HsC656 transgene and the other by the gfp-stop-cas9 HsC4 transgene ( Figure 3B).

DISCUSSION
In this study, we have presented evidence demonstrating that a human-optimized cas9 Hs transgene is poorly expressed in N. crassa. Our major findings are as follows: 1) a gfp-cas9 Hs transgene is poorly expressed relative to a control transgene gfp-sad-6, 2) removing the majority of cas9 Hs coding sequences from the gfp-cas9 Hs transgene (e.g., with gfp-cas9 HsC# transgenes) improves expression levels, 3) increasing the number of N. crassa-"preferred" codons does not improve expression of at least two segments of the cas9 coding region, and 4) cas9 Hs -coding sequences negatively influence transgene expression in a translation dependent manner.
Our data suggest that the relatively large size of a GFP-Cas9 fusion protein should not prevent its detection by our chosen cytological and flow-cytometry-based methods because both methods allowed us to detect a GFP-SAD-6 fusion protein. Importantly, the gfp-cas9 Hs and gfpsad-6 transgenes use the same promoter (ccg-1[P]) and the same gfp coding sequences. This suggests that transcription rates of both transgenes should be relatively similar, as should the efficiency of ribosome loading onto the mRNAs derived from each transgene (e.g., because the mRNAs should have identical 5′ UTRs). However, the dissimilarities between the gfp-cas9 Hs and gfp-sad-6 transgenes are numerous, and they include differences in coding sequences, 3′ UTRs, termination sequences, and locations in the genome. At this point, it is unclear which, if any, of these dissimilar factors are the primary cause of the poor expression of gfp-cas9 Hs in N. crassa.
To gain insight into which regions of the cas9 Hs coding sequence may negatively influence its expression, we systematically removed 5′ segments of increasing length from the cas9 Hs coding sequence with a series of gfp-cas9 HsC# transgenes. With this approach, we found it was necessary to remove nearly all the cas9 Hs coding sequence from a gfp-cas9 HsC# transgene before it would express GFP at detectable levels. One possibility is that mRNAs from transgenes with shorter cas9 Hs coding sequences are found at higher levels in N. crassa than are mRNAs with longer cas9 Hs coding sequences. If true, this could be due to differences in transcriptional efficiency or transcript stability. However, while we have not measured gfp-cas9 Hs mRNA levels in this study, we did measure GFP levels in strains carrying gfp-stop-cas9 Hs transgenes and we found that all gfp-stop-cas9 Hs transgenes express GFP at detectable levels, even if the transgene contained the full length cas9 Hs coding sequence. These findings suggest that gfp-cas9 Hs expression problems are encountered after N. crassa ribosomes finish translating gfp sequence and begin translating cas9 Hs sequences.
Codon optimization has long been considered a useful technique for improving the expression of heterologous sequences in transgenic organisms. In this study, we sought to improve the expression of gfp-cas9 HsC213 and gfp-cas9 HsC60 transgenes by increasing the codon adaptation index (CAI) (Sharp and Li 1987) of each transgene. This codon optimization strategy involves calculating relative adaptiveness (RA) values for each of the 64 possible codons from a set of species-specific high abundance mRNAs and exchanging codons with low RA values for synonymous codons with high RA values. We focused our efforts on gfp-cas9 HsC213 and gfp-cas9 HsC60 because the cas9 Hs coding sequences in both transgenes are relatively short, and even though gfp-cas9 HsC60 expresses GFP, it does so at a level that is an order of magnitude below that of the shortest transgene examined in this study (i.e., gfp-cas9 HsC4 ). Therefore, there appears to be room to improve the expression of both gfp-cas9 HsC213 and gfp-cas9 HsC60 transgenes.
Interestingly, the N. crassa optimized transgenes gfp-cas9 NcC213 and gfp-cas9 NcC60 did not express GFP at higher levels than their human optimized counterparts (Figure 4). This finding suggests that N. crassa codon preferences are neither the major factor preventing expression of gfp-cas9 HsC213 at detectable levels, nor the major factor limiting expression of gfp-cas9 HsC60 .
A possible explanation of the results observed in this study is that the N. crassa translational machinery stalls within the cas9 Hs segments of cas9 Hs -containing mRNAs. To shed light on this possibility, we placed stop codons between gfp and cas9 Hs sequences, effectively placing cas9 Hs "coding" sequences in the 3′ UTR of each transgene-derived mRNA. The presence of cas9 Hs coding sequences within the 3′ UTRs of gfp-stop-cas9 Hs transgene derived mRNAs influenced expression in unexpected ways. For example, while transgene expression was detected for all gfp-stop-cas9 HC# transgenes, it was highest for transgenes containing the smallest number of cas9 Hs codons in the 3′ UTR. This finding is consistent with the faux 3′ UTR model (Amrani et al. 2006;Nicholson et al. 2010;Zhang and Sachs 2015), where abnormally long 3′ UTRs are thought to trigger degradation of the mRNA by nonsense mediated decay.
However, our observations are not completely consistent with the faux 3′ UTR model because transgene levels were not strictly directly proportional to the length of the 3′ UTR. For example, transgene expression levels were higher when the 3′ UTR contained between 796 and 612 cas9 Hs codons than when the 3′ UTR contained between 535 and 213 codons. Currently, we do not have an explanation for this phenomenon. More importantly, we performed this set of experiments to determine if halting translation of gfp-cas9 Hs transgenes before ribosomes entered the cas9 Hs regions of an mRNA would improve their expression, and indeed, we found that it does. GFP production was higher from every gfp-stop-cas9 HsC# transgene than from the equivalent gfp-cas9 Hs counterpart, suggesting that the primary obstacle to cas9 Hs expression in N. crassa occurs during translation of cas9 Hs sequences.
Our investigation of cas9 Hs expression in N. crassa evolved from a desire to study cas9based synthetic gene drivers in filamentous fungi, an area of research that complements our primary focus on fungal meiotic drive elements (Hammond et al. 2012;Harvey et al. 2014;Pyle et al. 2016;Svedberg et al. 2018;Rhoades et al. 2019a). Although our current findings suggest that cas9 Hs sequences are poorly expressed in N. crassa, other researchers have successfully used    Table 2. Raw data is provided in Table S5. (B) Conidia from gfp-stop-cas9 Hs and gfp-stop-cas9 HsC# strains were analyzed for the presence of GFP by flow cytometry as in panel A. MFI values are averages of values obtained from two assays that were completed one day apart. The data used to construct the chart is provided in Table 3. Raw data is provided in Table S5.  Table 2. Raw data is provided in Table S5. Figure S1 cas9 NcC213 sequence. The sequence of cas9 NcC213 was ordered as a gBlock® (Integrated DNA technologies) and inserted into pJET1.2 to create plasmid pNR177.2. The cas9 coding sequence is depicted in red font. There are 221 codons: 213 codons for the cas9 C terminal end, seven codons for the SV40 NLS, and a single stop codon. The remainder of the sequence includes the S. cerevisiae cyc-1 terminator (black font) and a pTH1150.1 plasmid sequence to facilitate transformation vector construction (blue font).

Figure S2
Conidia sampling method. VMM slants were inoculated with N. crassa by qualitative transfer of conidia. After a 7-8 day incubation period (see methods), an approximately 2.5 mm path of conidia, from points "A" to "B" in the diagram, were transferred to tubes containing 1× PBS.

Figure S4
Comparison of cas9 HsC213 and cas9 NcC213 sequences. A sequence alignment of cas9 HsC213 and cas9 NcC213 is shown. A total of 120 codons in cas9 HsC213 were changed to synonymous codons that are found at higher frequencies in highly expressed N. crassa genes.        116.86 a "yes" indicates that the gfp coding sequence and the ten amino acid GAGAGAGAGA linker of the gfp-cas9 transgene were confirmed to be free of mutations in the indicated strain by Sanger sequencing. "no" means the sequence is assumed to be correct but it was not confirmed by Sanger sequencing. b "yes" indicates that the cas9 coding sequence was confirmed to be free of mutations in the indicated strain by Sanger sequencing. "no" means the sequence is assumed to be correct but it was not confirmed by Sanger sequencing. c Average mean fluorescence intensity (MFI) values were calculated from MFI measurements of the same strain taken on different days (replicate assays separated in time by one day). d The cas9 Hs coding sequence in plasmid pAH41.1 was confirmed to be free of mutations by Sanger sequencing but the cas9 Hs coding sequence was not confirmed to be mutation free after integration into the N. crassa genome.
Standard deviation values (stdev) are provided. na, not applicable. a "yes" indicates that the gfp coding sequence, ten amino acid GAGAGAGAGA linker, and stop codon of the gfp-stop-cas9 transgene were confirmed to be free of mutations in the indicated strain by Sanger sequencing. "no" means the sequence is assumed to be correct but it was not confirmed by Sanger sequencing. b "yes" indicates that the cas9 coding sequence was confirmed to be free of mutations in the indicated strain by Sanger sequencing. "no" means the sequence is assumed to be correct but it was not confirmed by Sanger sequencing. c Average MFI values were calculated from measurements of the same strain taken on different days (replicate assays were separated in time by one day). d The cas9 Hs coding sequence in plasmid pAH41.1 was confirmed to be free of mutations by Sanger sequencing but the cas9 Hs coding sequence was not confirmed to be mutation free after integration into the N. crassa genome. e Strain ISU-4938 contains a single nucleotide deletion in the coding sequence for the ten amino acid GAGAGAGAGA linker. The deleted nucleotide causes a frameshift that changes the amino acids at the end of GFP from GAGAGAGAGA to GAGAGRVLERSTCLL.
Standard deviation values (stdev) are provided. na, not applicable. 0.10 a Total is the total number of occurrences of each codon in the coding sequences of the 100 genes listed in Table S3. Relative adaptiveness (RA) values were calculated according to the method of Sharp and Li (1987).