Abstract
Despite the conserved essential function of centromeres, centromeric DNA itself is not conserved1–4. The histone-H3 variant, CENP-A, is the epigenetic mark that specifies centromere identity5–8. Paradoxically, CENP-A normally assembles on particular sequences at specific genomic locations. To gain insight into the specification of complex centromeres we took an evolutionary approach, fully assembling genomes and centromeres of related fission yeasts. Centromere domain organization, but not sequence, is conserved between Schizosaccharomyces pombe, S. octosporus and S. cryophilus with a central CENP-ACnp1 domain flanked by heterochromatic outer-repeat regions. Conserved syntenic clusters of tRNA genes and 5S rRNA genes occur across the centromeres of S. octosporus and S. cryophilus, suggesting conserved function. Remarkably, non-homologous centromere central-core sequences from S. octosporus are recognized in S. pombe, resulting in cross-species establishment of CENP-ACnp1 chromatin and functional kinetochores. Therefore, despite the lack of sequence conservation, Schizosaccharomyces centromere DNA possesses intrinsic conserved properties that promote assembly of CENP-A chromatin. Thus, centromere DNA can be recognized and function over unprecedented evolutionary timescales.
Centromeres are the chromosomal regions upon which kinetochores assemble to mediate accurate chromosome segregation. Evidence suggests that both genetic and epigenetic influences define centromere identity1,2,4,7,9. S. pombe, a paradigm for dissecting complex regional centromere function, has demarcated centromeres (35-110 kb) with a central domain assembled in CENP-ACnp1 chromatin, flanked by outer-repeat elements assembled in RNAi-dependent heterochromatin, in which histone-H3 is methylated on lysine-9 (H3K9)10–13. Heterochromatin is required for establishment but not maintenance of CENP-ACnp1 chromatin6,14. We have proposed that it is not the sequence per se of S. pombe central-core that is key in its ability to establish CENP-A chromatin, but the properties programmed by it15. To investigate whether these properties are conserved we have determined the centromere sequences of other Schizosaccharomyces species and tested their cross-species functionality.
Long-read (PacBio) sequencing permitted complete assembly of the genomes across centromeres of S. octosporus (11.9 Mb) and S. cryophilus (12.0 Mb), extending genome sequences16 to telomeric or subtelomeric repeats or rDNA arrays (Supplementary Figs. 1–3, Supplementary Tables 1,2). Consistent with their closer evolutionary relationship16,17, S. octosporus and S. cryophilus (32 My separation, compared to 119 My separation from S. pombe) exhibit greatest synteny (Fig. 1a). Synteny is preserved adjacent to centromeres (Fig. 1b). Circos plots indicate a chromosome arm translocation occurred within two ancestral centromeres to generate S. cryophilus cen2 (S.cry-cen2) and S.cry-cen3 relative to S. octosporus and S. pombe (Fig. 1b). Despite centromere-adjacent synteny, Schizosaccharomyces centromeres lack detectable sequence homology (see below). All centromeres contain a central domain: central-core (cnt) surrounded by inverted repeat (imr) elements unique to each centromere (Fig. 2, Supplementary Fig. 4, Supplementary Tables 3–6). CENP-ACnp1 localises to fission yeast centromeres (Fig. 2a) and ChlP-Seq indicates that central domains are assembled in CENP-ACnp1 chromatin, flanked by various outer-repeat elements assembled in H3K9me2-heterochromatin (Fig. 2b,c). Despite the lack of sequence conservation, S. octosporus and S. cryophilus centromere organisation is strongly conserved with that of S. pombe, having CENP-ACnp1-assembled central domains separated by clusters of tRNA genes from outer-repeats assembled in heterochromatin10–11 (Supplementary Fig. 5, Supplementary Tables 7,8). In contrast, our analyses of partially-assembled, transposon-rich centromeres of S. japonicus reveals the presence of heterochromatin on all classes of transposons and CENP-A on only two classes (Supplementary Fig. 5, Supplementary Table 9)16.
Numerous 5S rRNA genes are located in the heterochromatic outer-repeats of S. octosporus and S. cryophilus centromeres (but not S. pombe) (Fig. 1a, Supplementary Tables 10,11). Almost all (25/26; 20/20) are within Five-S-Associated Repeats (FSARs; 0.6-4.2 kb) (Fig. 3a), encompassing ~35% of outer-repeat regions. FSARs exhibit 90% intra-class homology (Supplementary Table 12), but no interspecies homology. The three types of FSAR repeats almost always occur together, in the same order and orientation, but vary in copy number: S. octosporus: (oFSAR-1)1(oFSAR-2)1-9(oFSAR-3)1; S. cryophilus: (cFSAR-1)1-3(cFSAR-2)1-2(cFSAR-3)1. Both sides of S. octosporus and S. cryophilus centromeres contain at least one FSAR-1-2-3 array, except the right side of S.cry-cen2 with two lone cFSAR-3 elements (Fig. 3a, Supplementary Fig. 4). S. cryophilus cFSAR-2 and cFSAR-3 repeats share ~400 bp homology (88% identity), constituting hsp16 heat-shock protein ORFs (Fig. 3a,b, Supplementary Table 13) that are intact, implying functionality, selection and expression in some situations. Phylogenetic gene trees indicate that cFSAR-3-hsp16 genes are more closely related with each other than with those in subtelomeric regions or cFSAR-2s (Fig. 3b), consistent with repeat homogenisation18–20. cFSAR-1s contain an eroded ORF with homology to a small hypothetical protein and S. octosporus oFSAR-2s contain a region of homology with a family of membrane proteins (Fig. 3a). The functions of centromere-associated hsp16 genes and other ORF-homologous regions remain to be explored.
S. cryophilus heterochromatic outer-repeats contain additional repetitive elements, including a 6.2 kb element (cTAR-14) with homology to the retrotransposon Tcry1 and transposon remnants at the mating-type locus16 (Figs. 1a,2b, Supplementary Fig. 4 and Supplementary Tables 3,4,14). Tcry1 is located in the chrIII-R subtelomeric region (Supplementary Figs. 3,4, Supplementary Table 1). Although no retrotransposons have been identified in S. octosporus, remnants are present in the mating-type locus and oTAR-14ex in S.oct-cen3 outer-repeats (Fig. 2c, Supplementary Figs. 1,4 and Supplementary Tables 5,6,15). Hence, transposon remnants, FSARs and other repeats are assembled in heterochromatin at S. octosporus and S. cryophilus centromeres and potentially mediate heterochromatin nucleation.
tDNA clusters occur at transitions between CENP-A and heterochromatin domains in two of three centromeres in S. octosporus (S.oct-cen2, S.oct-cen3) and S. cryophilus (S.cry-cen1, S.cry-cen2), and are associated with low levels of both H3K9me2 and CENP-ACnp1 (Fig. 2b,c), suggesting that they may act as boundaries, as in S. pombe21–23. No tDNAs demarcate the CENP-A/heterochromatin transition at S.cry-cen3. Instead, this transition coincides precisely with 270-bp LTRs (Fig. 2b, Supplementary Tables 3,4,14), which may also act as boundaries24–26. Like tDNAs, LTRs are regions of low nucleosome occupancy, which may counter spreading of heterochromatin26,27. In addition, tDNA clusters occur near the extremities of all centromeres in both species, separating heterochromatin from adjacent euchromatin. tDNAs and LTRs are thus likely to act as chromatin boundaries at fission yeast centromeres.
A high proportion (~32%) of tRNA genes in S. pombe, S. octosporus, and S. cryophilus genomes are located within centromere regions28 (Figs. 1a,3c; Supplementary Tables 16–18). Centromeric tDNAs are intact and are conserved in sequence with their genome-wide counterparts, indicating that they are functional genes. Two major, conserved tDNA clusters reside exclusively within S. octosporus and S. cryophilus centromeres (p-value<0.00001; q-value<0.05) (Fig. 3c,d). Cluster1 comprises several subclusters of 2-3 tDNAs in various combinations of up to 8 tDNAs, whilst Cluster2 contains up to 5 tDNAs (Fig. 3d); 17 different tDNAs (14 amino-acids) are represented, none of which are unique to centromeres (Fig. 3c). Intriguingly, the order and orientation of tDNAs within clusters is conserved between species, but intervening sequence is not (Fig. 3d,e). Strikingly, as well as local tDNA cluster conservation, inspection of centromere maps reveals synteny of tDNAs and clusters across large portions of S. octosporus and S. cryophilus centromeres. For example, the tDNA order AIR-RKL-E-T-T-L-DVAIR-RKLEF-A-DV (single-letter code) is observed at S.oct-cen1 and S.cry-cen3 (Supplementary Fig. 6). This synteny, together with both possessing small central-cores and long imrs suggests that these two centromeres are ancestrally related (Fig. 3f). Similarly, at S.oct-cen3 and S.cry-cen2, tDNAs occur in the order NME-DV-AIRKE-EKRIA-VD-EMN-RIAVD, and at S.oct-cen2 and S.cry-cen1 the same tDNAs are present in the imr repeats and beyond (FELK-KL-E-DV). Central-cores have similar sizes and structures in the two species, each containing long (oCNT-L(6.4 kb); cCNT-L(6.0 kb)) and short (oCNT-S(1.2 kb); cCNT-S(1.3 kb)) species-specific repeats (Fig.3f, Supplementary Tables 3–6,19). CNT-repeats are arranged head-to-tail at one centromere and head-to-head at the other centromere in each species. Together, these similarities suggest ancestral relationships between S.oct-cen2 and S.cry-cen1, So-cen3 and Scry-cen2. Further, in places where synteny appears to break down, patterns of tDNA clusters suggest specific centromeric rearrangements occurred between the species. For instance, tDNA clusters at the edges of S.cry-cen2R and S.cry-cen3L are consistent with an inter-centromere arm translocation relative to S.oct-cen1R and S.oct-cen2R, indicated by gene synteny maps (Figs. 1b, 4a and Supplementary Fig. 6).
No central-core sequence homology was revealed between species using BLASTN. To identify potential underlying centromere sequence features, k-mer frequencies (5-mers), normalized for centromeric AT-bias, were used in Principal Component Analysis. CENP-A-associated regions of all three genomes group together, distinct from the majority of non-centromere sequences (p-value, 9.3 × 10−7) (Fig. 4b,c). Interestingly, S. pombe neocentromere-forming regions29 also cluster separately from other genomic regions, sharing sequence features with centromeres.
K-mer analysis and conserved centromeric organisation prompted us to investigate cross-species functionality of protein and DNA components of Schizosaccharomyces centromeres. GFP-tagged CENP-ACnp1 protein from each species localised to S. pombe centromeres and complemented the cnp1-1 mutant30 (Fig. 5a-c), indicating that heterologous CENP-A proteins assemble and function at S. pombe centromeres, despite normally assembling on non-homologous sequences in their respective organisms.
Introduction of S. pombe central-core (S.pom-cnt) DNA on minichromosomes into S. pombe results in the establishment and maintenance of CENP-A chromatin if S.pom-cnt is adjacent to heterochromatin, or if CENP-A is overexpressed6,14,15,31. S.oct-cnt regions (3.2-10 kb) or S.pom-cnt2 (positive control) were placed adjacent to S. pombe outer-repeat DNA in mini-chromosome constructs (Fig. 6a) which were transformed into S. pombe cells expressing wild-type levels (wt-CENP-A) or overexpressing S. pombe GFP-CENP-ACnp1 (hi-CENP-ACnp1)15. Acquisition of centromere function is indicated by minichromosome retention on non-selective indicator plates (white/pale-pink colonies), and by the appearance of sectored colonies (Fig. 6b,c). The pHET-S.pom-cnt2 minichromosome containing S.pom-cnt2 established centromere function at high frequency immediately upon transformation in hi-CENP-ACnp1 cells (90%) and at lower frequency in wt-CENP-A cells (15%; not shown). Centromere function was established on S.oct-cnt-containing minichromosomes in hi-CENP-A cells only (Fig. 6d). Centromere function was not due to minichromosomes gaining portions of S. pombe central-core DNA (data not shown). CENP-ACnp1 ChIP-qPCR indicated that, for minichromosomes with established centromere function, CENP-ACnp1 chromatin was assembled on non-homologous S.oct-cnt DNA, to levels similar to those at endogenous S. pombe centromeres and to S.pom-cnt2 on a minichromosome (Fig. 6e). Minichromosomes containing S.oct-cnt provided efficient segregation function (Fig. 6d), no longer requiring CENP-ACnp1 overexpression to maintain that function once established (Fig. 6f), consistent with the self-propagating ability of CENP-A chromatin5,15. These analyses indicate that S.oct-cnt is competent to establish CENP-A chromatin and centromere function in S. pombe when CENP-ACnp1 is overexpressed, suggesting that S. octosporus central-core DNA has intrinsic properties that promote the establishment of CENP-A chromatin despite lacking sequence homology.
Based on conserved features, ancestral Schizosaccharomyces centromeres may have consisted of a CENP-ACnp1-assembled central-core surrounded by tDNA clusters and 5S rDNAs. We surmise that RNAPIII promoters perhaps provided targets for transposon integration32, followed by heterochromatin formation to silence retrotransposons and preserve genome integrity33,34. The ability of heterochromatin to recruit cohesin35,36, benefitting chromosome segregation selected for heterochromatin maintenance37,38, rather than underlying sequence which evolved by repeat expansion and continuous homogenisation18–20. Because tDNAs performed important functions – as boundaries preventing heterochromatin spread into central-cores and perhaps in higher order centromere organisation and architecture – tDNA clusters were maintained21. In S. pombe, non-centromeric and centromeric tDNAs and 5S rDNAs cluster adjacent to centromeres in a TFIIIC-dependent manner22,23. The multiple tandem centromeric 5S rDNAs and tDNAs could contribute to a robust, highly-folded heterochromatin structure promoting optimum kinetochore configuration for co-ordinated microtubule attachments and accurate chromosome segregation38.
The lack of overt sequence conservation between centromeres of different species appears not to prevent functional conservation, which may be driven by underlying sequence features or properties such as the transcriptional landscape. Although maintenance of centromere function has been observed at a pre-established human centromere in chicken cells39 (310 My divergence), CENP-A establishment on human alpha-satellite in mouse cells40 (90 My divergence) is surpassed by the competence of S. octosporus central-core DNA to establish CENP-A chromatin in S. pombe from which it is separated by 119 My of evolution16 (equivalent to 383 My using a chordate molecular clock). Thus, our analyses extend the evolutionary timescale over which cross-species establishment of CENP-A chromatin has been demonstrated.
Methods
Cell growth and manipulation
Standard genetic and molecular techniques were followed. Fission yeast methods were as described41. Strains used in this study are listed in Supplementary Table 20. All Schizosaccharomyces strains were grown at 32°C in YES, except S. cryophilus which was grown at 25°C unless otherwise stated. S. pombe cells carrying minichromosomes were grown in PMG-ade-ura. For low GFP-tagged CENP-ACnp1 protein expression from episomal plasmids, cells were grown in PMG-leu with thiamine.
PacBio sequencing of genomic DNA
High molecular weight genomic DNA was prepared from S. cryophilus, S. octosporus and S. japonicus using a Qiagen Blood and Cell Culture DNA Kit (Qiagen), according to manufacturer’s instructions. Pacific Biosciences (PacBio) sequencing was carried out at the CSHL Cancer Center Next Generation Genomics Shared Resource. Samples were prepared following the standard 20 kb PacBio protocol. Briefly: 10-20 μg of genomic material was sheared via g-tube (Covaris) to 20 kb. Samples were damage repaired via ExoVII (PacBio), damage repair mix and end repair mix using standard PacBio 20 kb protocol. Repaired DNA underwent blunt-end ligation to add SMRTbell adapters. For some libraries: 10-50 kb molecules from 1-2 μg SMRTbell libraries were size selected using BluePippin (Sage Science) after which samples were annealed to Pacbio SMRTbell primers per the standard PacBio 20 kb protocol. Annealed samples were sequenced on the PacBio RSII instrument with P4/C3 chemistry. Magbead loading was used to load each sample at a concentration between 50 to 200 pM. Additional PacBio sequencing (without BluePippin) was performed by Biomedical Research Core Facilities, University of Michigan. There, the following kits were used: DNA Sequencing Kit XL 1.0, DNA Template Prep Kit 2.0 (3Kb - 10Kb)” and DNA/Polymerase Binding Kit P4. MagBead Standard Seq v2 sequencing was performed using 10,000 bp size bin with no Stage Start with a 2 hour observation time on a PacBio RSII sequencer. A summary of PacBio sequencing performed is listed in Supplementary Table 21.
De novo whole genome assembly of PacBio sequence reads
PacBio reads were assembled using HGAP3 (The Hierarchical Genome Assembly Process version 3)42. Reads were first sorted by length, and the top 30% used as seed reads by HGAP3. All remaining reads of at least 1 kb in length were used to polish the seed reads. These polished reads were used to de novo assemble the genomes and Quiver software used to generate consensus genome contigs. Comparisons to the ChlP-seq input data and Broad Institute Schizosaccharomyces reference genomes16 showed very high agreement with these datasets. The S. octosporus and S. cryophilus chromosomes were named according to their sequence lengths, the longest chromosome being labelled as chromosome I in each case.
De novo assembly of the S. pombe genome using nanopore technology
Genomic DNA was extracted as described previously43. DNA purity and concentration were assessed using a Nanodrop 2000 and the double-stranded high sensitivity assay on a Qubit fluorometer, respectively. Genomic DNA was sequenced using the MinION nanopore sequencer (Oxford Nanopore Technologies). Three sequencing libraries were generated using the 1D ligation kit SQK-LSK108, the 2D ligation kit SQK-NSK007 and the 1D Rapid sequencing kit SQK-RAD002, following manufacturers guidelines. Each library was sequenced on one MinION flow cell. Sequencing reads were base-called using Metrichor (1D and 2D ligation libraries) or Albacore (rapid sequencing library). The combined dataset incorporating reads from three flow cells was assembled using Canu v1.5. The assembly was computed using default Canu parameters and a genome size of 13.8 Mbp. QUAST v3.2 was used to evaluate the genome assembly.
Genome annotation and chromosome structure
Genes were annotated onto the genome both de novo, using BLAST and the sequences of known genes, and by using liftover (https://genome-store.ucsc.edu) to carry over the previous gene annotation information from the Broad institute reference genomes (ref). CrossMap44 was then used to lift the chain files over to the new, updated genome. The locations of tDNAs were predicted using tRNAscan45,46. Dfam 2.047 was used to annotate repetitive DNA elements. MUMmer3.2348 was used to compare the genomes and annotate repeat elements and tandem repeat sequences, including those located in centromeric domain and telomere sequences. Centromeric repeat elements were manually identified using BLASTN and MEGABLAST (https://blast.ncbi.nlm.nih.gov). Each repeat element was named according to their sequence features (association with tDNA & rDNAs) and locations. The sequence of the wild-type (h90) S. pombe mating-type locus was obtained by manually merging nanopore and PacBio contigs using available data16, Supplementary Figure 10 and information at www.pombase.org/status/mating-type-region. Genome synteny alignment analysis was carried out using syMAP4249,50, based on orthologous genes among the three genomes.
ChIP-qPCR
For analysis of CENP-ACnp1 association with minichromosomes bearing S. octosporus central core DNA, three independent transformants with established centromere function (indicated by ability to form sectored colonies) for each minichromosome were grown in PMG-ade-ura cultures and fixed with 3.7% formaldehyde for 15 min at room temperature. Cells were lysed by bead-beating (Biospec) and ChIP was performed as previously described51. 10 μl anti-CENP-ACnp1 sheep antiserum and 25 μl Protein-G-Agarose beads (Roche) were used per ChIP. qPCR was performed using a LightCycler 480 and reagents (Roche) and analysed using Light Cycler 480 Software 1.5 (Roche). Primers used in qPCR are listed in Supplementary Table 22. Mean %IP ChIP values for Sp-cnt or So-cnt on minichromsomes were normalised to %IP for endogenous S. pombe cnt1. Error bars represent standard deviation.
ChIP-seq
A modified ChIP protocol was used. Briefly, pellets containing 7.5 × 108 cells were lysed by four 1 min pulses of bead beating in 500 μl of lysis buffer (50 mM HEPES-KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate), with resting on ice in between. The insoluble chromatin fraction was pelleted by centrifugation at 6000 g and washed with 1 ml lysis buffer before resuspension in 300 μl lysis buffer containing 0.2% SDS. Chromatin was sheared by sonication using a Bioruptor (Diagenode) for 30 minutes (30 s on/off, high setting). 900 μl of lysis buffer (no SDS) was added and samples clarified by centrifugation at 17000 g for 20 minutes and the supernatant used for ChIP. 6 μl anti-H3K9me2 mouse monoclonal mAb5.1.152 (kind gift from Takeshi Urano) or 30 μl sheep anti-CENP-ACnp1 antiserum51 were used, along with protein G-dynabeads (ThermoFisher Scientific) or Protein-G agarose beads (Roche), respectively. (For neocentromere strains, cells were first treated with Zymolyase 100T, washed in sorbitol and permeablized. Chromatin was fragmented with incubation with micrococcal nuclease. Cell suspensions were adjusted to standard ChIP buffer conditions and extracted chromatin was processed as per standard ChIP.) Immunoprecipitated DNA was recovered using Qiagen PCR purification columns. ChIP-Seq libraries were prepared with 1-5 ng of ChIP or 10 ng of input DNA. DNA was end-repaired using NEB Quick blunting kit (E1201L). The blunt, phosphorylated ends were treated with Klenow-exo- (NEB, M0212S) and dATP. After ligation of NEXTflex adapters (Bioo Scientific) DNA was PCR amplified with Illumina primers for 12-15 cycles and library fragments of ~300 bp (insert plus adaptor sequences) were selected using Ampure XP beads (Beckman Coulter). The libraries were sequenced following Illumina HiSeq2000 work flow (or as indicated in Supplementary Table 21).
Defining fission yeast centromeres
CENP-ACnp1 and H3K9me2 ChIP-seq data was generated to identify centromere regions. ChIP-Seq reads with mapping qualities lower than 30, or read pairs that were over 500-nt or less than 100-nt apart, were discarded. ChIP-seq data was normalized with respect to input data. Paired-end ChIP-seq data (single-end for S. japonicus) was aligned to the updated genome sequences using Bowtie253. Samtools54, Deeptools55 and IGV56 were subsequently used to generate sequence data coverage files and to visualize the data. MACS257 was used to detect CENP-ACnp1 and heterochromatin-enriched regions of the genome.
Centromere tDNA cluster analysis
To test for the enrichment of tDNA clusters at centromere regions a greedy search approach was used to identify potential clusters. All tDNAs less than 1000 bp apart were grouped into clusters. To test for significant clustering of tDNAs at the centromere the locations of tDNAs across the genome were shuffled 1000 times. For each cluster observed in the real genome the proportion of permutations where the same cluster was observed at least as many times was calculated to provide estimates of significance. Following conversion of these p-values to q values to account for multiple testing, the centromere tDNA clusters each exhibited a q-value less than 0.005.
Hsp16 gene tree analysis
hsp16 paralogs from S. octosporus and S. cryophilus genomes were predicted using BLASTP. The predicted protein sequences from hsp16 genes across all four fission yeasts were aligned together with those from S. cerevisiae using Clustal Omega. BEAST (Bayesian Evolutionary Analysis Sampling Trees)58 and FigTree (http://tree.bio.ed.ac.uk/software/figtree/) was used to generate and view the hsp16 gene phylogenetic tree.
5-mer frequency PCA analysis
The CENP-ACnp1-associated sequences in the S. pombe, S. cryophilus and S. octosporus genomes are all approximately 12 kb in length. Each genome was therefore split into 12 kb sliding windows with a 4.5 kb overlap. The frequencies of each 5-mer was calculated in each window using Jellyfish59. CENP-ACnp1-associated regions showed a general enrichment of AT base pairs relative to the genome as a whole. To normalize for GC content amongst the windows, all base pairs were randomized in each sequence window to generate 1000 artificial sequences with the same GC content. 5-mer frequencies were then recalculated for each of these 1000 artificial sequences and the true original 5-mer frequencies compared to these background frequencies by calculating a z-score. Consequently, these enrichment scores represent the k-mer enrichments in a given sequence normalized for GC content. Genome windows were split into 6 groups: CENP-ACnp1-associated sequences (CENP-ACnp1 peaks covering more than 6 kb of sequence); outer repeat heterochromatin regions (more than half the window covered by H3K9me2 peaks adjacent to CENP-A domains); sub-telomeric regions (more than half the window covered by H3K9me peaks and close to the end of a chromosome); Mating-type locus, neo-centromere regions (identified using CENP-ACnp1 ChIP-seq data on S. pombe neocentromere-containing strains29) and remaining genome sequences. Logistic regression and mean comparison were used to determine whether principal components were linked to the probability of a sequence belonging to a particular sequence group60. Logistic regression and mean comparison were used to determine whether principal components (FactoMineR) were linked to the probability of a sequence belonging to a particular sequence group.
Construction of minichromosomes
Regions of S. octosporus central core regions were amplified with primers indicated in Supplementary Table 22. Fragments were digested with BglII, Ncol or BamHI, Ncol and ligated into BglII-Ncol-digested plasmid pK(5.6kb)-MCS-ΔBam which contains a 5.6 kb fragment of the S. pombe K (dg) outer repeat. To create plasmid pK-So-cnt2-10kb, an additional 3.6 kb region from S.oct-cnt2 was inserted as a BamHI-SalI fragment into BglII-SalI-digested pK-So-cnt2-6.5kb to make a 10 kb region of S. octosporus central core. For pKp plasmids, S. octosporus central core regions were by inserted as BglII-Ncol or SalI-BamHI fragments into SalI-BamHI or NcoI-BamHI digested plasmid pKp (pMC91) which contains 2 kb region from S. pombe K(dg) outer repeat. Plasmids are listed in Supplementary Table 23.
Centromere establishment assay
Strains A7373 or A7408, which contains integrated nmt41-GFP-CENP-ACnp1 to allow high level expression of CENP-A15,were grown in PMG-complete medium and transformed using sorbitol-electroporation method61. Cells were plated on PMG-uracil-adenine plates and incubated at 32°C for 5-10 days until medium-sized colonies had grown. Colonies were replica-plated to PMG low adenine (10 μg/ml) plates to determine the frequency of establishment of centromere function. These indicator plates allow minichromosome loss (red) or retention (white/pale pink) to be determined. Minichromosome retention indicates that centromere function has been established and that minichromosomes segregate efficiently in mitosis. In the absence of centromere establishment, minichromosomes behave as episomes that are rapidly lost. Minichromosomes occasionally integrate giving a false positive white phenotype. To assess the frequency of such integration events and to confirm establishment of centromere segregation function, a proportion of colonies giving the white/pale-pink phenotype upon replica plating were re-streaked to single colonies on low-adenine plates – sectored colonies are indicative of segregation function with low levels of minichromosome loss, whereas pure white colonies are indicative of integration into endogenous chromosomes – and the establishment frequency adjusted accordingly.
Minichromosome stability assay
Minichromosome loss frequency was determined by half-sector assay. Briefly, transformants containing minichromsomes with established centromere function were grown in PMG-ade-ura to select for cells containing the minichromosome. Two transformants were analysed per minichromosome (four for pK-So-cnt2-4.7kb). Cells were plated on low-adenine containing plates and allowed to grow non-selectively for 4-7 days. Minichromosome loss is indicated by red sectors and retention by white sectors. To determine loss rate per division, all colonies were examined with a dissecting microscope. All colonies – except pure reds – were counted to give total number of colonies. Pure reds were checked for the absence of white sectors and were excluded because they had lost the minichromosome before plating. To determine colonies that lost the minichromosome in the first division after plating, ‘half-sectored’ colonies were counted. This included any colony that was 50% or greater red (including those with only a tiny white sector). Loss rate per division is calculated as the number of half-sectored colonies as a percentage of all (non-pure-red) colonies.
Immunolocalisation
For localisation of CENP-ACnp1, Schizosaccharomyces cultures were fixed with 3.7% formaldehyde for 7 min, before processing for immunofluorescence as described51. Anti-CENP-ACnp1 sheep antiserum51 (raised to the N-terminal 19 amino-acids of S. pombe CENP-ACnp1) was used at 1:1000 dilution, and Alexa-488-coupled donkey-anti-sheep secondary antibody (A11015; Invitrogen) at 1:1000 dilution. Cells were stained with DAPI and mounted in Vectashield. Microscopy was performed with a Zeiss Imaging 2 microscope (Zeiss) using a 100x 1.4NA Plan-Apochromat objective, Prior filter wheel, illumination by HBO100 mercury bulb. Image acquisition with a Photometrics Prime sCMOS camera (Photometrics, https://www.photometrics.com) was controlled using Metamorph software (Universal Imaging Corporation). Exposures were 1500 ms for FITC/Alexa-488 channel and 300-1000 ms for DAPI. Images shown in Figure 2a are autoscaled.
To express GFP-tagged versions of Schizosaccharomyces CENP-ACnp1 proteins in S. pombe, ORFs were amplified from relevant genomic DNA using primers listed in Supplementary Table 22. Fragments were digested with Ndel-BamHI or Ndel-BglII and ligated into Ndel-BamHI digested pREP41X-GFP vector62 (Supplementary Table 23). For detection of GFP-tagged versions of Schizosaccharomyces CENP-ACnp1 proteins in S. pombe, cells containing pREP41X-GFP-CENP-ACnp1 episomal plasmids (variable copy number) were grown in PMG-leu + thiamine to allow low GFP-CENP-ACnp1 expression. Cells were fixed, processed for immunolocalisation and microscopy as above. Anti-GFP antibody (A11122; Invitrogen) was used at 1:300, anti-Cdc1151 (a spindle-pole body marker; gift from Ken Sawin) was used at 1:600. Secondary antibodies were, respectively, Alexa-488 coupled chicken anti-rabbit (A21441; Invitrogen) and Alexa-594 coupled donkey anti-sheep (A11016; Invitrogen) both at 1:1000. Exposures were FITC/488 channel: 1500 ms, TRITC/594 1000 ms, DAPI 500-1000 ms. For display of images in Figure 5C, TRITC/594 and FITC/488 images are scaled relative to the maximum intensity in the set of images, whilst DAPI images are autoscaled.
Data Availability
All data generated in this study have been submitted to GEO under accession number: GSE112454. SRA submission number for S. pombe nanopore sequencing data: SUB3761672. The following figures have associated raw data: 1, 2, S1, S2, S3, S5.
Author Contributions
R.C.A. and A.L.P. designed the study. P.T. performed the PacBio genome assemblies and bioinformatics, ChIP-seq analysis and PCA analysis. C.M. performed the nanopore sequencing of S. pombe supervised by C.N. H.B., N.R.T.T. J.T.-G. and R.A. generated ChIP-seq data with contribution from M.S. A.L.P. performed cytology, analysis of repetitive regions, and experiments on cross-species functionality. R.C.A. supervised the study. A.L.P. wrote the manuscript with contributions from P.T., R.C.A. and other authors. All authors read and approved the final version of the manuscript.
Competing Financial Interests
The authors declare no competing financial interests.
Acknowledgments
We thank Alastair Kerr, Shaun Webb and Daniel Robertson for bioinformatics support, David Kelly for microscopy support, and Ken Sawin and Takeshi Urano for antibodies, and Kojiro Ishii, Ken Sawin and Nick Rhind for yeast strains. We thank Robert Lyons, Joe Washburn, Christina McHenry (University of Michigan) and Greg J. Hannon, Richard McCombie, Eric Antoniou and Sara Goodwin (CSHL) for PacBio sequencing. We are grateful to Chris Ponting for advice and comments on the manuscript and Sandra Catania and other members of the Allshire and Heun labs for helpful discussions. N.R.T.T., R.A. and J.T-G. were supported by the Darwin Trust of Edinburgh. The Darwin Trust and a Principal’s Career Development scholarship supported N.R.T.T. P.T. was partly supported by funding from the European Commission Network of Excellence EpiGeneSys-(HEALTH-F4-2010-257082) and a Wellcome Enhancement Award (095021) to R.C.A. R.C.A. is a Wellcome Principal Research Fellow (095021, 200885); the Wellcome Centre for Cell Biology is supported by core funding from Wellcome (203149). C.A.M. and C.A.N. are supported by Biotechnology and Biological Sciences Research Council (BBSRC) grant BB/N016858/1 and Wellcome Investigator Award 110064/Z/15/Z.Pacific Biosciences (PacBio) sequencing carried out at the CSHL Cancer Center Next Generation Genomics Shared Resource, which is supported by the Cancer Center Support Grant 5P30CA045508 was paid for by a kind gift from Kathryn W. Davis to GJH.