25 years of propagation in suspension cell culture results in substantial alterations of the Arabidopsis thaliana genome

Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that has been established about 25 years ago. Here we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Col-0 reference sequence were detected. The number of deletions exceeds the number of insertions thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions e.g. the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.


27
Arabidopsis thaliana is a small flowering plant which is distributed over the northern hemisphere and has 28 become the model system of choice for research in plant biology. In 2000, the genome sequence was released as 29 the first available plant genome sequence [1]. After generating this reference sequence from the accession 30 Columbia-0 (Col-0), many other A. thaliana accessions were analyzed by sequencing to investigate, among 31 many other topics, genomic diversity, local adaptation and the phylogenetic history of this species [2][3][4]. While 32 most initial re-sequencing projects relied on short read mapping against the Col-0 reference sequence [5][6][7], 33 technological progress enabled de novo genome assemblies [2,8] which reached a chromosome-level quality 34 [9][10][11]. Low coverage nanopore sequencing was also applied to search for genomic differences e.g. active 35 transposable elements (TEs) [12].

36
Due to the high value of A. thaliana for basic plant biology research, it is frequently grown in greenhouses 37 under controlled and optimized conditions. Previous studies investigated the mutation rates within a single 38 generation [6,13]. Mutational changes appear to be different between plants grown under controlled conditions 39 and natural samples collected in the environment [6]. Another approach harnessed an A. thaliana population in 40 the United States of America, which is assumed to originate from a single ancestor thus showing mutations 41 accumulated over the last decades [14]. This study investigated modern and ancient specimens and estimated a 42 rate of 7.1*10 -9 substitutions per site per generation [14]. Since not all mutations are fixed during evolution, the 43 mutation rate is higher than the substitution rate.

44
Even further away from natural conditions in the environment is the propagation of cells in suspension 45 cultures. Cells from such cultures can easily be employed for transient transfection experiments [15]. Transient 46 transfections of At7 protoplasts are a relatively straightforward method to study promoter structure and activity and to investigate the interactions between transcription factors and promoters of putative target genes [16,17].
Since most functions of plants are dispensable in suspension culture, it was expected that mutations in 49 these dispensable genes accumulate over time due to genetic drift or even due to positive selection. We

122
Mapping of the At7 Illumina short reads and ONT long reads to the Col-0 reference sequence was used to 123 assess genomic changes in At7. The coverage analysis revealed duplications and deletions of large 124 chromosomal segments ( Figure 1, File S1). Similar regional variations in ploidy between neighbouring 125 chromosomal segments are common in immortalized insect and mammalian cell lines and tumors, where they 126 may be an advantage to cells [36,37]. In the At7 suspension cell culture, about 5 Mbp at the northern end of 127 chromosome 2 (Chr2) and chromosome 4 (Chr4) appear highly fragmented. In addition, regions around the 128 centromeres are apparently fragmented, but this could be an artifact of higher repeat content and a substantial 129 proportion of collapsed peri-centromeric and centromeric sequences. These differences in chromosomal 130 stability seem to be consistent between plants and animals since similar observations were also reported for e.g.

131
Chinese hamster ovary (CHO) and Drosophila cell lines [36,37].  173 plastome is about 10 times higher than observed in native plants [11,46] ( Figure S3). The increased number of 174 mitochondria could be due to the specific conditions cultured At7 cells are exposed to.

250
This depletion of InDels inside of protein encoding regions indicates that at least residual selection against 251 disruption of these sequences is still ongoing.

252
Assessment of the functional impact of small sequence variants revealed a high impact effect (e.g.

253
premature stop codon or frameshift) on a total of 2,189 genes (File S4). This high number can be explained by 254 functional redundancy due to multiple alleles i.e. at least one allele is maintained in a functional state. In 255 addition, many genes might be dispensable under stable, stress-free cell culture conditions. Therefore, the 256 accumulation of disruptive variants or entire deletion is feasible. We restrained from gene ontology (GO) 257 enrichment analysis due to a functional high redundancy caused by the presence of multiple alleles for most 258 genes.

259
RNA-Seq analysis revealed a substantial difference in the abundance of native alleles and defective

277
The N50 of 1.2 Mbp is substantially lower than other recent reports of A. thaliana genome assemblies [9-11].

278
We speculate that points with abrupt changes in coverage of the At7 genome deteriorated the assembly