cT-DNA in Linaria vulgaris L. is multicopy, inverted and homogenized

The phenomenon of evolutionary fixation of agrobacterial sequences (cT-DNA or cellular transferred DNA) in plant genomes is well known in nature. It was previously considered, that all of cT-DNA-containing species, except Linaria vulgaris, have multiple inverted cT-DNA repeats. Deep studying of general features of cT-DNA brings us closer to understanding the causes and mechanisms of its fixation in plants genomes. We combined multiple long-range PCR with genome walking for studying extended structure of cT-DNA. Using digital PCR method, we estimated copy number of cT-DNA elements. NGS with low covering allows us to develop a set of microsatellite markers, also used for copy number estimation. According to new data, cT-DNA elements in L. vulgaris form an inverted complex repeat of two simple direct repeats. After cT-DNA integration, cT-DNA sequence duplication events took place at least two times. The phenomenon of concerted evolution of cT-DNA sequences as well as some details of this process have been shown for the first time. We have shown, that L. vulgaris, as well as other cT-DNA containing species, has inverted structure of repeats. This fact indicates possible existence of some general causes and mechanisms of cT-DNA fixation in plant genomes during evolution.

single T-DNA sequence which is found in the agrobacterial Ri-plasmid). However, the sequence EU735069.2 presented in Matveeva et al. 2012 was obtained without taking into account the possibility of high copy number of T-DNA. In this regard, we have conducted a more detailed study of the T-DNA sequence by sequencing Long-Range-PCR products obtained from unique primers. Unique primers were designed for the sites where elements join with each other and with plant DNA. The sequences of sites for elements joining were determined using GenomeWalker (Siebert et al. 1995). Six unique sites for elements joining were sequenced. Three of them (we call them β, β', γ) are formed by direct repeats, and three (ω, ο', φ) -by inverted repeats (Figure 1) shown for L. vulgaris for the first time. Theprimers selected for joining regions produced single target PCR-products, so sequences of joining regions could not be PCR-artifacts. The relative positioning of the elements was established using the Long-Range-PCR. The resulting PCR products containing sequences of individual elements were sequenced (Table 3). of the T-DNA block in L.vulgaris; b) differences in the sequence between the sites γ, β, β` ; c) structure of α (junction of element I with plant DNA); d) The structure of joining sites formed by inverted repeats (ω, φ, ο`).
Full-size blocks of fragments were sequenced (Fig.2b), as well as individual fragments (20.2, 160.1, 20.5, 18.2, 20.1, 135.11, 206.2 ) which could be their allelic variants. Analysis of the sequences gave us an opportunity to investigate T-DNA structure in L. vulgaris (Fig. 2a). The block of T-DNA elements is represented by a complex inverted repeat consisting of two simple direct repeats, with variants of deletions of various sizes. The positions of the elements are marked with Roman numerals (I-IV from right to left), and the sites of joining between them with Greek letters. The most intact of all joining sites is γ between elements I and II (Fig. 1b). β sequence is located between III and IV elements and it differs from γ by a large deletion. Instead of the deleted fragment, there is a sequence of 15 bp long, which is not present in γ and looks like a typical "filler DNA" (Neve et al. 1997), which formed directly upon the transformation during integration of T-DNA into the genome. The β sequence has a short allelic variant with nested deletion (β`) which also contains traces of "filler DNA" (Fig. 1b). Docking sites formed by inverted repeats (ω, φ, ο`) do not contain filler DNA. These sites seem to be a result of big deletions which occurred after transformation act (Fig. 1d).
In addition, we found out that: 1) The number of variants of the sequences for almost all positions is more than two (Fig. 2, b). As far as L.vulgaris is diploid (Tandon, Bali, 1957), and the number of alleles of a single-copy sequence cannot exceed two, this means that there are several such T-DNA blocks in the L.vulgaris genome. In position III five variants of individual sequences corresponding to at least 3 blocks (94.7, 106.10, 20.1, 9.13, in Fig. 2b and Fig. 2b and 160.1), and the integration site (α) in all four cases was identical.
The analysis of the state of ORF's from each element was carried out. It was found that all sequenced T-DNA genes except orf13a were pseudogenized. Premature stop codons and deletions leading to a reading frame shift were detected in all genes but not in orf13a. Gene orf13a was found between the orf13 and orf14 genes, and it is potentially intact in the sequences 20.1, 9.13 and 206.2.
To clarify cT-DNA copy number digital PCR was carried out. The test system for the α-sequence from cT-DNA was designed. Since Linaria genome is not well studied and single copy genes are unknown we decided to use microsatellite markers (previously selected) as reference sequences. Genome sequencing with low coverage was performed using Ion Torrent System for searching candidate sequences for microsatellite markers. Twenty three suitable reads with repeats were selected and primers were designed. All primers were tested on plants DNA from geographically distant regions (Peterhof, Russia and Hakassiya, Russia). Then fragment analysis of PCR products was performed to determine suitability of microsatellite markers. Suitability criteria were: 1) the presence of no more than two peaks of different lengths in each plant studied (indirect confirmation of single copy number); 2) the presence of a at least one peak of the sequence of predicted length in plants sequenced by Ion Torrent System. The list of markers that satisfied all requirements is given in Table 2. The TaqMan probes were designed for markers M006, M009, M010, M014, M015 and M019., Effectiveness of the test systems was estimated by real-time PCR. The M006 marker was excluded since it was multi-copy. The most efficient system based on the M009 marker was selected from the remaining ones. Using a digital PCR the copy number of the α-sequence from T-DNA was compared to the M009 marker. A ratio of 2.19 (1528 copies of the α sequence in 4590 cells and 697 copies in 4590 cells in M009) was obtained.

Discussion
We have shown an inverted structure of repeats for L. vulgaris cT-DNA. Such orientation of the elements during the integration into the genome is often observed (but not necessary). Neve (1997) has shown that the frequency of inverted repeats occurrence depends on the type of T-DNA and varies from 25 to 50% of all cases of multiple elements integration.
In addition, there is also a possibility of single T-DNA elements integration. However, the vast majority of known cT-DNAs We assume that the cT-DNA elements undergo (or at least underwent earlier) he mechanism of homogenization (or "concerted evolution"). This process is similar to the mechanism, which aligns repeats of ribosomal RNA and other multicopy sequences (Pavelitz and Rusche 1995; Liao 1999) and maintains their high identity. Distribution of the variants with large deletions this way has also been described (Pavelitz and Rusche 1995).
Another evidence in favor of this hypothesis is the distribution of SNP-polymorphism in T-DNA elements, which cannot be explained by divergent evolution. Phylogenetic trees, constructed on the base of different SNP-sites, contradict each other (Fig. 4). The alignment looks as if elements sequences were copied by fragments of 25-50 bp in size. This picture cannot also be explained by PCR-artifacts, such as chain change, since the elongation time in amplification programs was enough (see Materials and methods). Also mosaic patterns are observed at very short intervals (25-50 bp). However if, we assume, that this is a chain change, then it would have to occur in 25-50 bp steps, which is impossible. Thus, it is impossible to give any alternative hypotheses explaining the observed phenomenon, except concerted evolution.  According to our data, homogenization is progressing due to the replacement of short (25-50 bp) fragments of the sequence, but not whole repeat elements. Linaria vulgaris cT-DNA can become a valuable model object for studying the process of concerted evolution.

Conclusions
According to the results, the following conclusions were made: The structure of cT-DNA in Linaria vulgaris has been studied at a deeper level. According to new data, cT-DNA elements form inverted repeats (which is now shown for all cT-DNA-containing plants). The cT-DNA unit is organized as a complex inverted repeat of two simple tandem repeats. This block was duplicated at least twice after integration into the genome. Elements of cT-DNA are subject to homogenization process.

Materials and methods
DNA was isolated from L.vulgaris leaves by CTAB-method (Doyle and Doyle 1987).
Primer design was made for specific cT-DNA sites of junction between elements of repeats with each other or with the adjacent plant DNA (Fig. 1). Previously unknown junction sites were sequenced using the GenomeWalker approach (Siebert et al. 2005). The designed specific primers were checked for operability with a set of non-unique primers of the opposite direction, also the position of the adjacent element boundary was approximately determined.
The long-range PCR with various combinations of specific primers were conducted. Thus, the location of unique sites was studied (by the presence or absence of a PCR product), in the case of a positive result, a pure sequence of single elements was obtained after sequencing of a LongRange-PCR products.
To search for microsatellite markers L. vulgaris genomic DNA was sequenced with Ion Torrent System (TermoFisher Scientific, sequencing performed at Azco BioTech, USA). Obtained reads were used only for microsatellite motifs search, since a low coverage does not allow using this data for genome assembly or studying the structure of T-DNA.
Digital PCR was performed by Azco BioTech (USA).