Monoallelic antigen expression in trypanosomes requires a stage-specific transcription activator

Monoallelic expression of a single gene family member underpins a molecular “arms race” between many pathogens and their host, through host monoallelic immunoglobulin and pathogen monoallelic antigen expression. In Trypanosoma brucei, a single, abundant, variant surface glycoprotein (VSG) covers the entire surface of the bloodstream parasite1 and monoallelic VSG transcription underpins their archetypal example of antigenic variation. It is vital for pathogenicity, only occurring in mammalian infectious forms1. Transcription of one VSG gene is achieved by RNA polymerase I (Pol I)2 in a singular nuclear structure: the expression site body (ESB)3. How monoallelic expression of the single VSG is achieved is incompletely understood and no specific ESB components are known. Here, using a protein localisation screen in bloodstream parasites, we discovered the first ESB-specific protein: ESB1. It is specific to VSG-expressing life cycle stages where it is necessary for VSG expression, and its overexpression activates inactive VSG promoters. This showed monoallelic VSG transcription requires a stage-specific activator. Furthermore, ESB1 is necessary for Pol I recruitment to the ESB, however transcript processing and inactive VSG gene exclusion ESB sub-domains do not require ESB1. This shows that the cellular solution for monoallelic transcription is a complex factory of functionally distinct and separably assembled sub-domains.

proteins, required for exclusion of the inactive BESs 23,24 , associate the single active BES with 28 the Spliced Leader (SL) array 25 chromosomal locations. These contain the repetitive genes 29 encoding a sequence which, after transcription and processing, is added to every trypanosome 30 mRNA 26 . Hence, in addition to other properties, VEX proteins link an ESB located exclusion 31 phenomenon to an active VSG gene abundant mRNA processing capability. Notwithstanding 32 these advances, bloodstream-specific factors (Fig. 1A, Extended Data Table 1) remain elusive 33 and the statement that "No ESB-specific factor has yet been identified" 27 still holds true. 34 We performed a candidate protein localisation screen of proteins upregulated in the 35 bloodstream form 28  We named this protein ESB1. 40 We used well-characterised ESB markers to confirm the ESB1 localisation. Pol I is the founding 41 component of the ESB and localises to both the nucleolus and the ESB in bloodstream forms 3 . 42 ESB1 colocalised with Pol I (RPA2) at the ESB (Fig. 1C) confirmed by measurement of the 43 distance between signal centre points (Fig. 1D). The ESB also has a VEX sub-complex involved 44 in exclusion of inactive ESs 23 . ESB1 did not precisely colocalise with VEX1 (Fig. 1C,D). After S 45 phase, cells still exhibit a single ESB before the nucleus undergoes closed mitosis -a second 46 ESB only forms during anaphase 29 . ESB1 always localises to a single focus per nucleus (Fig. 1E) 47 and only duplicates during mitosis (Fig. 1F). ESB1 is therefore specific to the ESB both spatially 48 (localisation) and temporally (life cycle stage-specific expression and cell cycle-dependent 49 localisation). 50 To determine ESB1 function, we generated a bloodstream form ESB1 conditional knockout 51 cytokinesis and further rounds of organelle duplication ( Fig. 2A,B). 54 To detect any effect on BES transcription, we used RNAseq to profile mRNA levels and showed 55 ESB1 cKO caused a dramatic decrease (~250×) in ESAG mRNAs. Mapping ESAGs to BESs 8 56 showed that those with reduced mRNA levels were predominantly associated with the active 57 BES (Fig. 2D, Extended Data Fig. 3E). The VSG in the active BES decreased ~8× (Fig. 2D) which 58 we confirmed by qPCR (Fig. 2F). The difference in these levels is expected given VSG mRNA has 59 a longer half-life than ESAG mRNAs 30 . ESB1 cKO therefore reduces active BES transcription. 60 This cKO phenotype was recapitulated with the more experimentally amenable RNAi 61 knockdown of ESB1 (Extended Data Fig. 4). The strength of the phenotype naturally led to 62 appearance of RNAi escape sub-populations 31 ; therefore, we analysed only early RNAi time 63 points. 64 Procyclic forms lack an active BES, an ESB and do not express ESB1 although they use Pol I for 65 expression of their surface coat protein (procyclin). We tested ESB1 cryptic function in 66 procyclic forms by deleting both ESB1 alleles, which gave no apparent growth or morphology 67 defect. RNAseq confirmed normal high expression of GPEET procyclin and no major changes to 68 other mRNA transcripts ( Fig. 2G-I, Extended Data Fig. 3F). ESB1 is therefore vital in 69 bloodstream forms for monoallelic VSG expression but not required in procyclic forms. 70 We next determined whether ESB1 is required for the normal molecular composition of the 71 ESB. We generated a panel of cell lines carrying the inducible ESB1 RNAi construct and tagged 72 ESB proteins: RPA2, SUMO (as the ESB is associated with a highly SUMOylated focus 21 ), VEX1 or 73 VEX2 ( Fig. 3A-H). As shown by others, the ESB focus of RPA2 was visible in 40% of G1 nuclei (ie 74 when not occluded by nucleolar RPA2) 3  Pol I to the ESB and higher local SUMOylation to form the HSF, but not formation of VEX foci. 81 The inverse, whether the ESB1 focus is VEX1-dependent, was analysed by depleting VEX1 using 82 RNAi and observing tagged ESB1 (Fig. 3I-K). RNAseq profiling of mRNA confirmed VEX1 83 knockdown with no growth defect (Fig. 3I,J) and, as previously described 23 , derepression of 84 inactive BESs (data not shown). Tagged ESB1 localisation was unchanged on VEX1 knockdown 85 (Fig. 3K); therefore, formation of a singular ESB is not dependent on VEX1 repression of 86 inactive BESs. 87 We then asked whether ESB1 overexpression could force ectopic BES expression and/or 88 supernumerary ESB formation. Significant overexpression was achieved using a cell line with 89 an additional inducible tagged ESB1 locus (using 100 ng/ml doxycycline) ( Fig. 4A-F, Extended 90 Data Fig. 3B). In contrast to ESB1 cKO, overexpression gave a small growth reduction and some 91 cytokinesis defects (Fig. 4A,B). Overexpressed ESB1 still localised to the ESB, although with 92 more dispersion in the nucleoplasm and cytoplasm, for both morphologically normal and 93 abnormal cells (Fig. 4C). ESB1 overexpression in a cell line also expressing tagged RPA2 showed 94 Pol I was not dispersed and still localised at the single nucleolus and ESB (Fig. 4G-I confirmed using qRT-PCR in Fig. 4F) arising from a ~10× increase in ESB1 mRNA (Fig. 4E). ESB1 100 overexpression is therefore sufficient to cause activation of inactive VSG BESs whilst 101 expression of procyclic form-specific surface proteins remained low (Extended Data Fig. 5A). 102 Finally, we forced expression of tagged ESB1 in procyclic form cells (Fig. 4J-N, Fig. S5B). 103 Significant expression produced no growth or cytokinesis defects (Fig. 4J,K). Tagged ESB1 was 104 nuclear but did not localise to a single extranucleolar ESB-like focus (Fig. 4L). RNAseq analysis 105 showed a large increase (up to ~200×) in mRNA level for ESAGs, consistent with transcription 106 initiation at BES promoters which are normally inactive in the procyclic form (Fig. 4M). In this 107 particular strain, we interrogated expression of the ESAGs and VSG from the sequenced BES 32 . 108 Every ESAG in this BES was upregulated, typically ~3-5× and up to ~80× (Fig. 4O,P). In contrast, 109 VSG mRNAs (both published and from our de novo assembly of the transcriptome) were not 110 strongly upregulated (Fig. 4M). We did not see transcript from VSG 10.1, found in the 111 sequenced BES, nor upregulation of any of the VSGs commonly expressed by this strain in 112 bloodstream forms during mouse infection 33 . This is despite ~50× overexpression of ESB1 113 transcript relative to endogenous bloodstream form expression (Fig. 4E, Fig. 4N). As for tagged 114 ESB1 overexpression in the bloodstream form, procyclin mRNA levels also remained 115 unchanged (Fig. 4M)

Parasite strains and cell culture
Trypanosoma brucei brucei strain Lister 427 was selected for monomorphic bloodstream form (BSF) experiments because its expression sites have been cloned and sequenced 36 , and are assembled into contigs in the 2018 re-sequence 8 . BSFs were grown in HMI-9 37 at 37°C with 5% CO2, maintained under approximately 2×10 6 cells/ml by regular subculture. We confirmed that the active BES was BES1 containing VSG 427-2 (see Transcriptomic analysis).
T. brucei brucei strain TREU927 was selected for procyclic form (PCF) experiments as it is the original genome strain 38 with a well-annotated genome and was used for our genome-wide protein localisation project in PCFs 28,39 . PCFs were grown in SDM-79 40 at 28°C, maintained between approximately 6×10 5 and 2×10 7 cells/ml by regular subculture.
To enable CRISPR/Cas9-assisted genetic modifications and to use doxycycline-inducible genetic modifications we used PCF and BSF cell lines which expresses T7 polymerase, Tet repressor, Cas9 and PURO drug selectable marker. These cell lines were generated using pJ1339, an expression construct which carries homology arms for integration in the tubulin locus. We have previously described the TREU927 PCF 1339 cell line 41 . To generate the Lister 427 BSF 1339 cell line, pJ1339 was linearised by restriction digest with HindIII and transfection into BSFs (see Electroporation and drug selection).
Culture density was measured with a haemocytometer (BSFs) or a CASY model TT cell counter (Roche Diagnostics) with a 60 µm capillary and exclusion of particles with a pseudo diameter below 2.0 µm (PCFs).

Electroporation and drug selection
Electroporation was used to transfect T. brucei with linear DNA constructs which have 5′ and 3′ homology arms, leading to construct integration into the target locus by homologous recombination. 1 to 5 µg of linearised plasmid DNA or DNA from the necessary PCR reactions was purified by either phenol chloroform extraction (for the medium-throughput bloodstream form localisation screen) or ethanol precipitation (other experiments) then mixed with approximately 3×10 7 cells for BSFs or 1×10 7 cells for PCFs resuspended in 100 µl of Tb-BSF buffer 42 . Transfection was carried out using program X-001 of the Amaxa Nucleofector IIb (Lonza) electroporator in 2 mm gap cuvettes. Following electroporation, cells were transferred to 10 ml pre-warmed HMI-9 (BSFs) or 10 ml SDM-79 (PCFs) for 6 h then the necessary selection drugs were added to select for cells with successful construct integration. Clonal cell lines were generated for all cell lines, except those from medium throughput tagging, by limiting dilution cloning.

Medium-throughput BSF localisation screen for ESB proteins
Our localisation-based screen for ESB proteins used endogenous tagging with a fluorescent protein (see Endogenous tagging). Tagging candidates were selected based on previously published mRNA abundance data determined by RNAseq 43 by selecting genes with transcripts significantly upregulated (p < 0.05, Student's T test) in the BSF relative to the PCF and prioritising those >2.5× upregulated (Fig. 1B). We further prioritised genes with unknown function, and excluded VSG genes and pseudogenes, ESAGs, genes related to ESAGs (GRESAGs) and known invariant surface glycoproteins (ISGs). Some known proteins were, however, tagged as controls, such as ISG65 and GPI-PLC. We also used other transcriptomic and ribosome footprinting datasets for further manual prioritisation 43-46,46,47 .
Tagging was primarily at the N terminus. The C terminus was tagged if the protein had a predicted signal peptide. We attempted tagging of 207 proteins with a 73% success rate generating a 153 tagged cell lines. Of these, 7 had a nuclear signal (0).

Endogenous tagging
N or C terminal tagging by modification of gene endogenous loci was carried out as previously described, using long primer PCR to generate tagging constructs and, for BSF tagging, PCR to generate DNA encoding sgRNA with a T7 promoter 48,49 . Primer design, PCR using the pPOT series of plasmids as templates and sgDNA production were carried out as previously described 48,49 . mNeonGreen (mNG) 50 was used for all tagging with a green fluorescent protein.
pPOT v4 Blast mNG was used for the medium-throughput bloodstream form localisation screen. pPOT v6 Blast 6×TY::mNG was used for other experiments. In this construct 3×TY epitope tags lie either side of the mNG (3×TY::mNG:: 3×TY), however for simplicity we refer to this as a 6×TY::mNG tag. tdTomato (TdT) and pPOT v4 Hyg TdT was used for tagging with a red fluorescent protein for colocalisation experiments.
For ESB1 tagging in PCFs, where no fluorescent signal from tagging was detected, we generated a clonal cell line by limiting dilution and confirmed the correct fusion of the mNG CDS to the ESB1 CDS was achieved (see Endogenous locus ORF modification/loss PCR validation) (Extended Data Fig. 2A,B).
For ESB1 tagging in BSFs, to test whether the fluorescent protein tag on ESB1 caused a protein localisation or function defect we confirmed that the C terminally-tagged protein had the same localisation as the N terminally-tagged (Extended Data Fig. 2C,D). T. brucei are diploid. Therefore we also confirmed, by deletion of the untagged allele, that expression of N terminally-tagged ESB1 in the absence of the wild-type allele gave the same localisation (Extended Data Fig. 2E,F) and we saw no morphological or cell growth defect (data not shown).

Gene knockout and conditional knockout
Gene knockout was carried out using long primer PCR to generate deletion and sgRNA constructs 48,49 . As for endogenous tagging, this was carried out as previously described 48,49 . pPOT v7 Hyg and pPOT v6 Blast were used for generation of deletion constructs. We confirmed knockout by testing for loss of the target gene CDS and replacement of the target gene CDS with the drug selection marker (see Endogenous locus ORF modification/loss PCR validation). We were unable to generate a BSF ESB1 deletion cell line and therefore used a conditional knockout (cKO) approach by first generating a cell line capable of doxycycline-inducible exogenous ESB1 expression.
For exogenous ESB1 expression or overexpression, the Tb927. 10.3800 ORF was amplified by PCR from T. b. brucei TREU927 gDNA and cloned into variants of pDex577 51 and pDex777 52 with an 1×TY::mNG combined fluorescence reporter and epitope tag. These are doxycyclineinducible constructs using a T7 promoter for the inducible expression which carry homology arms for integration into the transcriptionally silent minichromosome repeats. Each pDex577/pDex777 construct was linearised by restriction digest with NotI and transfected into BSFs and PCFs (see Electroporation and drug selection).
We titrated doxycycline concentrations to select concentrations which achieved desirable tagged ESB1 expression levels in BSFs and PCFs, using a combination of light microscopy (to evaluate fluorescence intensity of mNG relative to cell lines with an endogenous ESB1 mNG tag), Western blots (see Western blotting) (Extended Data Fig. 3A,B) and RNAseq (see Transcriptomic analysis). We selected induction conditions to give (i) expression comparable to endogenous expression levels in BSFs (pDex577 with 10 ng/ml doxycycline Extended Data Fig.  3B), (ii) overexpression sufficient to generate an aberrant BSF phenotype (pDex577 with 100 ng/ml doxycycline, Extended Data Fig. 3B) or (iii) high overexpression in PCFs (pDex777 with 1 mg/ml doxycycline, Fig. 4N).
RNAseq confirmed no major perturbation of cellular transcripts in BSFs expressing tagged ESB1 using pDex577 with 10 ng/ml doxycycline (Extended Data Fig. 3C). We then deleted both endogenous ESB1 alleles (Extended Data Fig. 3D) while maintaining the cell line with 10 ng/ml doxycycline to generate the cKO cell line. The cKO phenotype was observed by washing doxycycline out of the culture (see Induction time series).

Endogenous locus ORF modification/loss PCR validation
Key endogenous locus modifications were validated using PCR using genomic DNA (gDNA) extracted from the modified cell line using the DNeasy Blood & Tissue Kit (Qiagen) as the template. Primer pairs were designed spanning from the endogenous DNA sequence to the DNA introduced by homologous recombination. For deletions, the gene 5′ UTR to the drug selection marker ORF (Extended Data Fig. 2F, Extended Data Fig. 3D), and for tagging, the gene ORF to the fluorescent tag ORF (Extended Data Fig. 2B,F). PCR product size was checked by agarose gel electrophoresis. Primer sequences used are listed in Extended Data Table 2. For genetic modifications where both gene alleles were modified (e.g. gene deletions) the first allele was modified, the modification confirmed by PCR using gDNA as the template, then the second allele was modified.

Inducible RNAi knockdown
For inducible ESB1 or VEX1 RNAi knockdown, we amplified a fragment of the target genes' ORFs and cloned them into a new doxycycline-inducible RNAi construct, pDRv0.5. The resulting plasmid had two copies of the amplicon cloned in reverse complement separated by a 150 nt stuffer fragment, with transcription of the resulting "stem-loop" driven by two T7 promoters under the control of doxycycline that flanked the insert (see Extended Data Table 3 for primer sequences). Cell were transfected with NotI linearised plasmid and transgenic cells selected using Hygromycin B (see Electroporation and drug selection). RNAi knockdown was induced using 1 µg/mL doxycycline.
To confirm these RNAi constructs gave effective knockdown we introduced them into cell lines expressing an endogenously tagged copy of the target protein whose knockdown could be confirmed by light microscopy (Extended Data Fig. 4C) and/or Western blot (Extended Data Fig. 4D) and/or RNAseq to determine transcript abundance of the target gene (see Transcriptomic analysis) (Fig. 3J).
For ESB1 RNAi knockdown in cell lines expressing endogenously tagged RPA2, SUMO, VEX1 or VEX2 we confirmed effective ESB1 knockdown, and confirmed that that escape from ESB1 RNAi had not occurred, by checking for the expected growth rate defect and determining the proportion of cells at different cell cycle stages in the population (see Induction time series).

Western blotting
We used Western blotting to confirm molecular weight and expression level of endogenously tagged and exogenously (over)expressed proteins. An anti-mNG (mouse monoclonal IgG2c, ChromoTek 32f6, RRID:AB_2827566) or anti-TY (from BB2 hybridoma, mouse monoclonal IgG1 53 ) primary antibody and anti-mouse HRP-conjugated secondary antibody were used using standard protocols.

Induction time series
Doxycycline-inducible and cKO cell lines were analysed as induction time series with paired induced and uninduced samples. Cells in logarithmic growth were subcultured to 1×10 5 cell/ml (BSFs) or 1×10 6 cells/ml (PCFs), one sample without and one with the necessary concentration of doxycycline for induction. Each 24 h, the culture density was measured, samples taken, then the remaining cells subcultured to 1×10 5 cell/ml (BSFs) or 1×10 6 cells/ml (PCFs), including doxycycline in the induced sample. For cultures with a strong growth defect, the culture was centrifuged at 1200 g for 5 min then the cell pellet was resuspended in fresh medium, and doxycycline added if needed, in order to maintain constant conditions. Each set of samples included a sample for light microscopy (to analyse cell morphology and cell cycle progression, see Microscopy) and any RNA, protein, or other samples.

Microscopy
Light microscopy was carried out on live cells adhered to glass with DNA stained with Hoechst 33342, as previously described 54 . Images were captured on a DM5500 B (Leica) upright widefield epifluorescence microscope using a plan apo NA/1.4 63× phase contrast oilimmersion objective (Leica, 15506351) and a Neo v5.5 (Andor) sCMOS camera using MicroManager 55 .
Cell cycle progression was evaluated from microscope images. In the normal T. brucei cell cycle kinetoplast (K, mitochondrial DNA) division precedes nuclear (N) division, giving 1K1N, 2K1N then 2K2N cells prior to cytokinesis. These approximately correspond to G1, S and post-mitosis to cytokinesis phases respectively. K/N number in cells were manually counted as a measure of cell cycle progression, with abnormal K/N numbers classified as 'other' Spacing of point-like structures, one in the green channel and one in the red, was carried out by fitting a Gaussian in the X and Y directions to determine the signal centre point in each channel then calculating their separation, as previously described 56 . Prior to analysis, images were corrected for chromatic aberration as previously described 57 using a sample of 0.1 µm TetraSpeck multi-colour fluorescent beads (ThermoFisher) adhered to glass as a reference sample. Measurement error was determined by measuring green-red spacing using the multicolour fluorescent beads.
For blinded counts of nuclear structures, one researcher identified and cropped in-focus nuclei (based on Hoechst 33342 signal) from 1K1N cells in microscopy fields of view from a mixture of test and control samples. Each cropped image was saved with a randomised unique file name, recording the source sample in an index file. A second researcher then classified the image files containing single nuclei, then the analysis was unblinded using the index file. To quantify uniquely mapped read coverage the resulting bam file was filtered to include reads mapped in proper pairs and exclude reads not mapped, exclude read secondary alignments and exclude read PCR or optical duplicates with a MAPQ>10, using samtools view (version 1.7) with flags -q 10, -F 0x504 and -f 0x02. RPKM was calculated from the output of samtools (version 1.7) idxstats. Mean coverage was calculated from the output of samtools depth with flags -aa and -d 10000000.

Transcriptomic analysis
To confirm changes to active BES VSG expression, we carried out quantitative reverse transcription PCR (qRT-PCR) using a one-step protocol directly from total RNA using primers specific to VSG 427-2 and β-tubulin (Extended Data Table 4). RNA samples were diluted to approximately equal concentration based on OD260. A dilution series of six three-fold steps from 1:3 0 (1:1) to 1:3 6 (1:279) dilutions of RNA from the parental cell line was used to confirm critical cycles for both VSG 427-2 and tubulin fell in the linear range. Mean VSG 427-2 and tubulin critical cycle was determined in triplicate using 1:10 diluted RNA samples. Samples were compared using VSG 427-2 to tubulin critical cycle ratio.
For de novo transcriptome assembly we used Trinity (version 2.11.0) guided by Harvard FAS best practices 60 . Sequencing errors were first corrected using Rcorrector (version 1.0.4) 61 and uncorrectable reads were removed, then any remaining adapters and low quality read sections trimmed with Trim Galore! (version 0.6.0) with flags --length 36, -q 5, --stringency 1 and -e 0.1.
Finally, sequences at the end of the read which exactly matched 4 or more bases of the 3′ end of the T. brucei spliced leader sequence were trimmed using a custom python script. Trinity was then used using default settings to generate the de novo assembly.             Tagging does not perturb ESB1 localisation or function. A) Clonal bloodstream form and procyclic form cell lines expressing Tb427.10.3800 or Tb927.10.3800 (ESB1) N terminally tagged with 6×TY::mNG respectively were re-generated following the initial screen. Epifluorescence images of the localisation of the tagged protein by mNG fluorescence. B) Confirmation of the expected genetic modification of the cell lines in A) by PCR from genomic DNA using a forward mNG and a reverse ESB1 ORF primer. Schematic shows the primer binding sites, DNA gel shows the resulting PCR products from extracted genomic DNA from the tagged (Tag.) or parental (Par.) cell line. C) Epifluorescence images of bloodstream form cell lines expressing 6×TY::mNG::ESB1 (N terminal tag), ESB1::mNG::6×TY (C terminal tag). D) Count of the number of points per nucleus at different stages of the cell cycle (1K1N, 2K1N and 2K2N) for N or C terminally tagged ESB1. E) Epifluorescence image of a single knockout (sKO) bloodstream form cell line with one N terminally tagged ESB1 allele and the other deleted by replacement with a drug selectable marker. F) PCR validation of the sKO cell line. Schematics represent the deleted ESB1 ORF (top) and N terminally tagged locus (bottom) and primer binding sites, DNA gels shows the resulting PCR products from extracted genomic DNA.