Abstract
Transcription-coupled repair (TCR) removes base damage on the transcribed strand of a gene to ensure a quick resumption of transcription. Based on the absence of key enzymes for TCR and empirical evidence, TCR was thought to be missing in Drosophila melanogaster. The recent demonstration of TCR in S2 cells raises the question about the involved genes. Since the mismatch repair (MMR) pathway serves a central role in TCR, at least in Escherichia coli, we studied the mutational signatures in flies with a deletion of the MMR gene spellchecker1 (spel1), a MutS homolog. Whole-genome sequencing of mutation accumulation (MA) lines obtained 7,345 new single nucleotide variants (SNVs) and 5,672 short indel mutations, the largest data set from an MA study in D. melanogaster. Based on the observed mutational strand-asymmetries, we conclude that TCR is still active without spel1. The operation of TCR is further confirmed by a negative association between mutation rate and gene expression. Surprisingly, the TCR signatures are detected for introns, but not for exons. We propose that an additional exon-specific repair pathway is masking the signature of TCR. This study presents the first step towards understanding the molecular basis of TCR in Drosophila melanogaster.
Background
DNA continuously undergoes a large number of spontaneous chemical modifications leading to DNA damage (1, 2). The damaged bases can cause mutations, block DNA replication, and interfere with transcription (3). To repair some of these adducts, nucleotide excision repair (NER) removes the damaged strand at short distances from both sides of the lesion and a new strand is synthesized to fill the gap (4). NER has two pathways to recognize these lesions: the global genomic repair (GGR) and the transcription-coupled repair (TCR) (5). GGR scans the whole genome and the DNA damage is recognized by helix-distorting lesions (4). TCR detects the base damage from RNA polymerase stalling in actively transcribed DNA (4, 5) and leads to a mutational asymmetry between the strands (6). Transcription inhibition can be dangerous to a cell or even an organism (7, 8) so a quick resumption of transcription is vital. TCR is found in most bacterial species and many eukaryotes (9) and a defective pathway causes strong disease phenotypes, such as xeroderma pigmentosum and Cockayne’s syndrome in humans (10).
Drosophila melanogaster presents an interesting case where the GGR pathway for NER is present but TCR was thought to be missing (11, 12). The lack of fly homologs for genes required for TCR in other organisms — CSA/ERCC8 and CSB/ERCC6 — suggested that the pathway was lost during evolution (11). Furthermore, biochemical studies failed to detect TCR after UV-induced damage in D. melanogaster cell cultures (13, 14). In addition to the indirect evidence for TCR which is based on positive correlation of the compositional skews in introns (15) with expression (16), a recent study showed that TCR is operating in Drosophila S2 cells (17). This result raises the important question of how Drosophila is able to perform TCR when the key genes CSA and CSB are absent.
In E. coli, mismatch repair genes MutS and MutL are required for TCR (18). In yeast, genes required for NER interact with MMR genes (19) but MMR deficient cells are still performing TCR (20). In humans, MMR repair also interacts with NER (21) and evidence suggests that the pathway is involved in TCR of UV and oxidative damage (22). However, the issue remains controversial (23). Given the uncertainty about the functional basis of TCR in Drosophila, we determined the influence of the MutS homolog spellchecker1 (spel1) on TCR in flies using MMR deficient mutation accumulation (MA) lines. The lack of MMR in MA lines is expected to result in a high number of mispaired bases. Such bases do not only lead to mutations but also cause local structural and dynamic distortions in the DNA structure (24) and are hotspots for DNA damage due to the higher susceptibility of unpaired bases to chemical modifications (25). For example, the loss of msh2 in mice, Trypanosoma brucei, and T. cruzi increases oxidative damage of guanine by reactive oxygen species (26–28), which is repaired by TCR in murine cells (29).
Using a mismatch repair-deficient background, we find mutational asymmetries that are negatively associated with germline expression intensities, demonstrating functional TCR without spel1. Based on the absence of the TCR signatures in exons, we propose that an additional, exon-specific repair mechanism is operating.
Results
We generated a spel1 null mutant using CRISPR with guide RNAs targeting the 5’ and 3’ ends of the gene. Our mutant contained a double insertion of the template plasmid with the backbone of the vector (Supplementary Figure 1), a frequent event arising from the recombination of two plasmids into the locus (30). We propagated seven independent lines for 10 generations by brother-sister mating and identified 7,345 new single nucleotide variants (SNV) and 5,672 indels in females from these mutation accumulation lines. With 73.5% of the non-synonymous substitutions on the autosomes and 77.0% on the X chromosome, our data did not significantly deviate from the 75% expected under neutrality (31) (Fisher’s exact test (FET), p=0.4143 for the autosomes; FET, p=0.6573 for X).
The presence of TCR can be detected by mutational asymmetries between the transcribed and non-transcribed strands (6). The identification of mutational asymmetries is critically dependent on the correct null hypothesis. Drosophila introns have a skewed base composition, which depends on transcription levels (16). We confirmed that the fraction of thymines and cytosines on the transcribed strand is significantly negatively associated with expression in both ovaries (Wald test, OR=0.9977, p<2.2e-16 for thymines; Wald test, OR=0.9997, p=1.16e-13 for cytosines) and testes (Wald test, OR=0.9982, p<2.2e-16 for thymines; Wald test, OR=0.9994, p<2.2e-16 for cytosines) (Figure 1. a,b). We accounted for this by including the bias into the formulation of a null hypothesis for the expected number of mutations on the transcribed and on the non-transcribed strands. We calculated the expected bias with two different approaches: from the mutated genes and from a sample of genes with a similar expression as the mutated genes (see Methods). Both approaches produced highly consistent results.
5,071 SNVs located in genes were used to test the ratio of bias-adjusted mutation rates on the transcribed and non-transcribed strands for every mutation type. Without TCR, a rate ratio (RR) of 1 is expected and the statistical significance can be determined with a Poisson test. After multiple testing correction, C>A mutations occurred less often (Poisson test, RR=0.82; 95% CI: 0.70-0.95, adjusted p-value=0.038), (Figure 1. c; Supplementary Table 1) and T>C mutations more often on the transcribed strand (Poisson test, RR=1.12, 95% CI: 1.02-1.22, adjusted p-value=0.039) (Figure 1. c; Supplementary Table 1). Similar results were obtained using the gene expression sampling scheme (see Methods, Supplementary Figure 2). Assuming that TCR is causing this bias, this implies that cytosine and adenine are more likely to experience base damage than other bases in MMR deficient flies.
DNA repair and damage processes can differ between exons and introns (32, 33). We therefore analyzed exons and introns separately. After excluding SNVs which overlapped both exon and intron annotations, C>A mutations occurred less often on the transcribed strand (RR=0.732, 95% CI: 0.597-0.895, adjusted p-value=0.014), but exonic C>A mutations did not (RR=0.995, 95% CI: 0.783-1.263, adjusted p-value=1) (Figure 1. d; Supplementary Table 2). Despite intronic T>C mutations occurring slightly more often on the transcribed strand, this was not significant (Poisson test, RR=1.113, 95% CI: 0.996-1.243, adjusted p-value=0.155). However, looking for the effect of the 5’ and 3’ bases flanking the mutation, we observed that the A[T>C]N context is exhibiting a significant strand bias in introns with a rate ratio of 1.472 (Poisson test, 95% CI: 1.138-1.910, adjusted p-value=0.01) but not in exons (Poisson test, RR=0.894, 95% CI: 0.627-1.280, adjusted p-value=0.866). No other contexts exhibited strand bias (Figure 1. d; Supplementary Table 2). Since the null hypothesis was not adjusted for triplet composition, we updated our null hypothesis to take into account the 5’ and 3’ flanking bases by performing a permutation test (see Methods) and obtained similar results. Intronic A[T>C]N mutations still exhibited a significant strand bias (permutation test, p=0.001) while exonic mutations did not (permutation test, p=0.215) (Supplementary Figure 3).
To confirm that the strand bias is caused by TCR, we tested for expression differences in genes containing C>A or A[T>C]N mutations. In the case of an active TCR, a correlation between strand asymmetry and gene expression is expected, because DNA damage on the transcribed strand is more likely to be detected in highly expressed genes. Thus, mutations arising from DNA damage on the transcribed strand should be found in lowly expressed genes. We used the FlyAtlas2 (34) expression data set from ovaries and testes as a proxy for the expression environment where the mutations occurred. Consistent with these predictions, we found that the genes with intronic C>A mutations on the transcribed strand have on average lower expression in both ovaries (one-sided Wilcoxon rank-sum test, adjusted p-value=0.022) and testes (one-sided Wilcoxon rank-sum test, adjusted p-value=0.022) than genes with intronic C>A mutations on the non-transcribed strand (Figure 2. a). As expected from the lack of strand bias, the expression level of genes with exonic C>A mutations were not different (Figure 2. a). Genes with context-dependent A[T>C]N mutations were not differentially expressed (Figure 2. b). This could be due to either a lack of power or because the expression data used does not reflect the expression environment where the base damage occurred.
While the expression analysis suggests that TCR is responsible for the strand bias for C>A mutations, it is important to rule out the alternative explanation of a mutagenic effect of transcription on the non-transcribed strand. We used a randomization procedure (see Methods) to test if C>A mutations occur more frequently on the non-transcribed strand of highly expressed genes. Consistent with previous observations (16, 31), we found no evidence that transcription is mutagenic neither in testes (randomization test, p=0.611) nor in ovaries (randomization test, p=0.403) (Supplementary Figure 4) ruling it out as the source of the strand bias.
Based on the combined evidence, we conclude that TCR is operating in Drosophila biasing the C>A mutations and spel1 is not required. Nevertheless, it is not clear why TCR signatures are only detected for introns, but not for exons. Two different explanations can account for the lack of mutational strand bias in exons for the C>A mutations: i) TCR requires spel1 in exons or ii) an additional DNA repair mechanism is operating on exons, which erases the signal of TCR. The two explanations can be distinguished based on their different predictions for the relative mutation rates. MMR dependence for exons predicts an increased mutation rate for exons on the transcribed strand while the latter predicts a reduced exonic mutation rate for the non-transcribed strand. To test these hypotheses, we performed a permutation test while controlling for the triplet context in exons and introns to test for relative mutation rate differences. We found no evidence of elevated exonic mutation rate on the transcribed strand (permutation test, p=0.4762) (Supplementary Figure 5) showing that the lack of spel1 does not cause the missing strand bias. We found signs - although nonsignificant - of reduced exonic mutation rate on the non-transcribed strand for C>A mutations (Supplementary Figure 5) (permutation test, p=0.067) suggesting that the lack of exonic strand bias may be caused by a favorable repair. The A[T>C]N did not show differences in the relative mutation rates on the transcribed (permutation test, p=0.475) or the non-transcribed strand (permutation test, p=0.126) (Supplementary Figure 6).
Discussion
We demonstrated that TCR is independent of MMR in flies by uncovering TCR-induced mutational asymmetries in intronic C>A mutations in MMR deficient D. melanogaster mutation accumulation lines. Because UV-light was not used during the experiment, we are able to demonstrate that TCR in flies is not only limited to UV-induced damage, as previously seen (17), but can also repair other types of DNA damage. The C>A mutations can arise from mismatches with oxidatively damaged DNA (35). An important finding is that TCR does not cause mutational asymmetry in exons. We ruled out that this is caused by the MMR deficiency and found support for a pathway that protects exons over introns thus masking the signatures of TCR. A similar finding was made in human cells where less oxidative DNA damage accumulates in exons than in introns — possibly due to a favorable repair (33). If a similar process is occurring in flies, as our data suggest, we propose that the global repair pathway of nucleotide excision repair is favoring exons over introns. The global repair does not discriminate between the transcribed and non-transcribed strands and detects the same lesions as TCR thus explaining the lack of strand bias, the gene expression difference, and the signs of reduced exonic mutation rate on the non-transcribed strand for C>A mutations.
In summary, generating the largest de novo mutation data set from an MA study in D. melanogaster, we demonstrated that TCR operates against DNA damage in the germline independent of the MMR pathway. We uncovered differences in mutational processes of exons and introns and attribute this to an additional repair operating on exons. We anticipate the use of spel1 mutations will become a widely used approach to study mutation patterns in a broad range of species.
Materials and methods
Generating the spel1 deletion and mutation accumulation
The spel1 null mutant was generated from an isogenized Oregon-R strain using the CRISPR-Cas9 genome engineering tool. The 2nd and the 3rd chromosomes were isogenized with balancers and the variation on the X chromosome was reduced by 5 generations of full-sib mating. Two gRNAs targeting the second and the last exon of spel1 were cloned with the Gibson Assembly® Cloning Kit (New England Biolabs) into a BbsI (10,000 units/ml, NEB, R0539) digested pCDF4 (50) (Addgene plasmid # 49411; http://n2t.net/addgene:49411; RRID:Addgene_49411) expression vector. The ligation product was transformed into SURE2 cells and the construct was verified by Sanger sequencing.
A template for homology-directed repair was generated by Golden gate cloning. 1 kb homology arms were amplified from genomic DNA with primers LT41-LT44. Purified amplicons were mixed (30 ng each) with 50 ng pJET1.2-STOP-dsRed (51) (Addgene plasmid # 60944; http://n2t.net/addgene:60944; RRID:Addgene_60944), 50 ng pBS-GGAC-ATGC (51) (Addgene plasmid # 60949; http://n2t.net/addgene:60949; RRID:Addgene_60949), 1.5 μl 10x T4 ligation buffer, 1 μl BsmBI (10,000 units/ml, NEB, R0580), and water was added to 14 μl. After incubation for min at 55°C 1 μl T4 ligase (400,000 units/ml, NEB, M0202) was added. Ligation was performed by cycling the reaction between 5 min in 42°C and 5 min in 16°C overnight. Final digestion was performed for 30 min in 55°C followed by 10 min at 80°C to inactivate the enzyme. The ligation product was treated with Plasmid-safe nuclease (10,000 units/ml, Epicentre, E3101K) and transformed into SURE2 cells. Positive colonies were identified with colony PCR and recovered plasmids were verified by sequencing.
The germline transformation was achieved by microinjecting a mixture of the template (500 ng/μl), the gRNA expression vector (100 ng/μl), and pHsp70-Cas9 (52) (250 ng/μl) (Addgene plasmid # 60944; http://n2t.net/addgene:60944; RRID:Addgene_60944) into dechorionated fly embryos. F1 progeny were screened for the 3XP3::DsRed marker and a correct targeting of spel1 was confirmed with PCR and sequencing. A PCR was performed to detect the double integration where the template plasmid integrates twice into the locus with the backbone. All the primers used in this study are listed in Supplementary Table 3.
We performed 10 generations of mutation accumulation with full-sib mating and sequenced individual females from 7 surviving lines.
Library preparation and sequencing
Genomic DNA was extracted from a single female fly of each MA line using a standard high salt extraction method (36) with RNase A treatment. From each female, 70 ng genomic DNA was used to prepare paired-end libraries with the NEBNext Ultra II FS DNA Library Prep Kit (New England Biolabs, Ipswich, MA) using only 10% of the reagents recommended in the original protocol of the supplier. After double sided size selection targeting an insert size of 300 bp, libraries were amplified with dual-index primers using 5 PCR cycles. After purification with AMPureXP beads (Beckman Coulter, Brea, CA), the 7 libraries were quantified using the Qubit dsDNA HS Kit (Invitrogen, Carlsbad, CA), combined in equimolar amounts with additional 4 libraries from another experiment and sequenced on one lane of a HiSeq2500 using a 2×125bp protocol.
QC and reads mapping
Libraries were first demultiplexed using ReadTools (37) (version 1.5.2; AssignReadGroupByBarcode --splitSample, --maximumMismatches 1, providing the corresponding barcodes). The raw reads were assessed for their quality using FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Low quality tails at 3’ end were trimmed using ReadTools (--mottQualityThreshold 20, --minReadLength 50, --disable5pTrim true) and BAM files were converted to compressed FASTQ files using ReadTools (ReadsToFastq -- interleavedInput true --barcodeInReadName true --outputFormat GZIP). As FastQC detected residual levels of adapter contamination, adapter cleaning was performed with the BBTools suit (38) using BBDuk (version 38.32; ktrim=r k=23 mink=11 hdist=1 tbo).
Processed paired-end reads were mapped to the D. melanogaster reference genome release 6.24 indexed with the bwa index command using BWA-MEM (39) (version 0.7.17; bwamem) on a Hadoop cluster using DistMap (40) (version 2.7.5).
PCR duplicates were removed using PICARD (http://broadinstitute.github.io/picard/) MarkDuplicates tool (version 2.21.3; REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT). We kept mapped reads with each segment properly aligned and removed reads that mapped equally well to multiple positions or have a low mapping quality using SAMtools (41) (version 1.9; -b -q 20 -f 0×002 -F 0×004 -F 0×008). We clipped the processed overlapping paired-end reads using the BamUtil suit (42) (version 1.0.13; bam clipOverlap --in --out --stats).
Variants inventory
The fasta reference was indexed using SAMtools faidx command. Processed BAM files were sorted and indexed with SAMtools for each chromosome arm (2L, 2R, 3L, 3R, X) separately using SAMtools view command. We then added a unique read group tag per sample using the PICARD AddOrReplaceReadGroups command. We increased the accuracy of variant calling by using two different tools; Freebayes (45, cloned from https://github.com/ekg/freebayes) (version v0.9.10-3-g47a713e) and GATK HaplotypeCaller (46) (version 4.0.12.0) and kept only variants that were identified with both tools. To use the parallel version of Freebayes, we split the reference into 1Mb regions with Freebayes fasta_generate_regions.py script (python version 2.7.17). For each chromosome arm, we used the freebayes-parallel executable (-C 1 –F 0.01 --min-base-quality 20, all other options set to default), providing the 1 Mb regions file and individual BAM files. Second, we followed (31) and used the GATK HaplotypeCaller with --heterozygosity 0.01 option with default settings.
We obtained two raw lists of variants per chromosome arm in a VCF format (43). Two different filtering procedures were applied for each variant caller.
For Freebayes, each raw list of variants was filtered as follows:
to remove variants based on depth at the variant position using BCFtools (44) (version 1.8; filter –i SAF>0 && SAR>0 && (SAF+SAR+SRF+SRR)>5),
to suppress variants within 5-bp of an INDEL using BCFtools (filter -g 5),
to keep variants with at most 2 alleles denoted by reference and alternate alleles using VCFtools (43) (version 0.1.15; --vcf --min-alleles 1 --max-alleles 2 --recode-INFO-all --recode),
to simplify multi-nucleotide polymorphisms into SNPs using vt cloned from https://github.com/atks/vt (45) (version 0.57721; decompose_blocksub, normalize commands successively),
to filter for QUAL>40 using VCFtools (--vcf –minQ 40 --recode-INFO-all --recode).
For GATK, we used GATK VariantFiltration with the options --filter-expression “QD < 2.0” --filter-name “QD” --filter-expression “FS > 60.0” --filter-name “FS” --filter-expression “MQ < 40.0” --filter-name “MQ” --filter-expression “MQRankSum < -12.5” --filter-name “MQRankSum” --filter-expression “ReadPosRankSum < -8.0” --filter-name “ReadPosRankSum”.
We intersected the two filtered VCF files retaining only variants with the same position using BEDtools (46) (version 2.27.1; intersect –u –a -b -wa –header). We then extracted private SNPs using BEDtools (intersect –v –a -b –header) providing all bgzipped and tabix-indexed (47) (version 1.8; -p vcf) VCF per line. Finally, we subtracted the variants lists with the variants called from 10 individual spel1 null flies, which did not go through MA, as a quality control for residual ancestral alternative alleles after having applied a similar pipeline; we masked the X region 6240639:6686943 from line 5 using BEDtools (intersect –v –a -b –header) where some residual variants were observed. We obtained a final set of 7,345 SNPs and 5,672 INDELs.
For our analyses we relied on the genome annotation from flybase Dmel-all-filtered-r6.30.gff (downloaded from: ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r6.30_FB2019_05/gff/ in May 2019).
Statistical analyses
All statistical analyses were done with R (48) (version 3.5.0).
Fraction of non-synonymous mutations compared to neutral expectation
The SnpEff software (49) (version 4.3) was used to distinguish synonymous and nonsynonymous mutations in the longest transcript of each gene. We performed a Fisher’s exact test to compare the observed and expected number of synonymous and non-synonymous mutations. Following (31), we used odds of 1:3 for synonymous and non-synonymous mutations as a neutral expectation.
Skew of intronic base composition
Gene expression data from ovaries and testes tissues were obtained from FlyAtlas2 (34), representing 16,781 genes. FPKM gene expression values were grouped into 40 bins, separately for ovaries and testes with the mltools::bin_data (50) (version 0.3.5; binType=“quantile”) R function. Since alternative splicing may generate ambiguous signals, 7 bases from the 5’ end and 35 bases from the 3’ end were removed from introns to exclude genomic regions containing splicing sequences as recommended in (15). AT (CG) skews were then calculated as the number of T (C) on the transcribed strand over the total number of A and T (C and G) bases. For each tissue and type of skew, we fitted a Generalized Linear Model (51) using the glm(cbind(#transcribed, #total-#transcribed), family=“binomial”) R function, and reported the Wald test p-values corresponding to the binned gene expression covariate.
Mutational strand bias
We restricted our analysis to unambiguous exons and introns and excluded annotations overlapping with other genes located on a different strand using the BEDtools intersect -s command (a GTF with the final annotation can be found in the Dryad repository). We used the Bioconductor MutationalPatterns package (52) (version 1.12.0) to count the different mutation types on the transcribed and non-transcribed strands. Our first approach was to estimate the expected mutation rate from the base composition on the transcribed and non-transcribed strand of genes with at least one mutation. Since without strand bias a ratio of 1 is expected, we calculated its significance and 95% confidence intervals using the poisson.test R function.
In the second approach, we accounted for the impact of gene expression intensity on base composition. For each of the 40 expression bins, we randomly sampled the same number of genes as observed being mutated in our SNPs set and calculated the expected strand bias from the sample.
We repeated the approaches for the expected intronic and exonic biases, using exclusively either intronic or exonic sequences. The p-values were corrected for multiple testing using the Benjamini-Hochberg procedure.
In order to take the 5’ and 3’ flanking bases of the A[T>C]N mutations into account in the null hypothesis, we adapted a permutation procedure from (32) to test for strand bias in exons and introns (Supplementary Figure 3). Briefly, we obtained the frequency of mutations for each of the 4 A[T>C]N contexts (triplets) genome-wide and rescaled the frequencies to sum up to 1. In parallel, we used the GATK tool CallableLoci (53) to obtain the callable sites per line and the BEDtools suit (maskfasta and getfasta commands) to mask the reference for non-callable sites. For both strands, we then retained as a sampling pool the number of callable triplets in the mutated genes for exons, introns, summed over each line, and multiplied it with the rescaled frequency to weight the sampling according to the genome-wide prevalence of triplets. Finally, we redistributed the observed number of mutations on the transcribed and non-transcribed strand separately 10,000 times to get the expected number of mutations on the transcribed strand in introns and exons. The p-values were calculated as the number of times the sampled value was higher than the observed one divided by 10,000.
Gene expression analysis for C>A and A[T>C]N mutations
Gene expression differences between genes containing C>A and A[T>C]N mutations on different strands were tested with either one-sided (intron) or two-sided (exon) Wilcoxon rank-sum test on the FPKM scale. We used a one-sided test for intronic sequences because the strand bias predicts the direction of gene expression difference. The p-values were corrected for multiple testing using the Benjamini-Hochberg procedure.
Mutagenic effect of transcription
To test if transcription is mutagenic, we performed a randomization test similarly as (54) (Supplementary Figure 4). We randomly picked 221 genes, corresponding to the number of C>A intronic mutations overlapping with the FlyAtlas2 data (over 236) on the non-transcribed strand, and computed the mean expression in ovaries and testes separately. The sampling was weighted by the length of the introns. This was done 10,000 times. For each tissue, a p-value was calculated as the number of times the randomly sampled mean values exceeded the observed mean divided by 10,000.
Decreased exonic mutation rate for C>A and A[T>C]N mutations
We used a similar permutation procedure as described above in the mutational strand bias subsection to test for reduced exonic mutation rates (Supplementary Figures 5, 6). We modified the sampling pool of callable triplets to include the genome-wide exons and introns with strands separated.
Code and data availability
The code (R and bash scripts) will be accessible in the following github repository: ***, available upon publication.
The final set of SNPs and INDELs as well as the updated annotation and intermediate files can be found from the following dryad repository: ***, available upon publication.
Raw reads will be available in the following SRA project: ***, available upon publication.
Authors contribution
L. T. performed experiments, V. N. performed sequencing, L. T., C. B. analyzed the data, L. T., C. B., V. N., C. S. wrote the paper, K.S. supervised the project and provided feedback, L. T., C. S. designed the study.
Acknowledgments
Sequencing was performed at the VBCF NGS Unit (https://www.viennabiocenter.org/facilities/next-generation-sequencing/). This work was supported by the Austrian Science Fund (FWF, grants W1225, P27630, P29133). We thank Lukas Endler for technical comments.