The application of Nicotiana benthamiana as a Transient Expression Host to Clone the Coding Sequences of Plant Genes

Coding sequences (CDS) are commonly used for transient gene expression, in yeast two-hybrid screening, to verify protein interactions and in prokaryotic gene expression studies. CDS are most commonly obtained using complementary DNA (cDNA) derived from messenger RNA (mRNA) extracted from plant tissues and generated by reverse transcription. However, some CDS are difficult to acquire through this process as they are expressed at extremely low levels or have specific spatial and/or temporal expression patterns in vivo. These challenges require the development of alternative CDS cloning technologies. In this study, we found that the genomic intron-containing gene coding sequences (gDNA) from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can be correctly transcribed and spliced into mRNA in Nicotiana benthamiana. In contrast, gDNAs from Triticum aestivum and Sorghum bicolor did not function correctly. In transient expression experiments, the target DNA sequence is driven by a constitutive promoter. Theoretically, a sufficient amount of mRNA can be extracted from the N. benthamiana leaves, making it conducive to the cloning of CDS target genes. Our data demonstrate that N. benthamiana can be used as an effective host for the cloning CDS of plant genes.


Introduction
The transient expression system of Nicotiana benthamiana provides a rapid, high-yield [1], and cost-effective method for the synthesis of proteins and biopharmaceuticals [2][3][4][5]. Agrobacterium can be used to transport protein expression constructs into plant cells.
Only a few DNA fragments can be integrated into the chromosomes of plant cells and most foreign DNA molecules remain transcriptionally active for several days [6]. Transient gene expression studies have shown that maximum expression occurs within 18 hours (h) to 48 h after inoculation and can last for 10 days [7]. Transient expression systems are often used for the analysis of proteins including protein subcellular localization studies [8], co-immunoprecipitation (Co-IP) [9], and bimolecular fluorescence complementation (BiFC) [10,11].
Generally, transient expression constructs consist of a constitutive promoter, the CDS of the target gene, and tags for protein detection. mRNA derived from plant tissues can be synthesized into cDNA by reverse transcription to acquire the CDS of target genes. However, some CDS are difficult to amplify from cDNA for cloning resulting in extremely low expression levels of the target gene or specific spatial and/or temporal expression patterns in plants. Alternative methods to synthesize CDS such as chemically synthesizing genes or the assembly of exons into CDS [12,13] can be used yet the chemical synthesis of genes is relatively expensive, particularly for long CDS.
Also, it is difficult to assemble the CDS for genes that contain many exons and so there is a need for the development of alternative CDS cloning measures.

Genomic DNA extraction and construction of vectors
The CTAB DNA extraction protocol was used to extract genomic DNA [14]. The pUC19 plasmid was modified into a gateway compatible entry vector pUC19. In short, the gDNA sequence was amplified from plant genomic DNA and transferred into the entry vector pUC19 using one-step cloning technology. Then, the entry vector and the expression vector pEarleygate 101 were recombined by the LR enzyme to produce 35S::gDNA-YFP-HA protein expression structure for subsequent transient expression experiments [15]. The HA tag was used to detect the transiently expressed proteins.

Transient expression system of N. benthamiana
The transient expression of genes in N benthamiana leaves was carried out as previously reported [16]. Briefly, one-month-old N. benthamiana plants were used for transfection. All of the constructs were transformed into the Agrobacterium strain GV3101. The Agrobacteria strains were cultured overnight in liquid media and the bacteria were collected and resuspended in induction medium (10 mM MES, pH5.6, 10 mM MgCl2, 100 mM acetosyringone). The final concentration of Agrobacteria strains was 0.5 at OD 600 . The bacteria were transfected into N. benthamiana leaves using a 1 ml needle-less syringe. The N. benthamiana plants were maintained in the dark for 24 h after infiltration.

mRNA extraction and cDNA synthesis by reverse transcription
Total RNA was extracted from the plant leaves (0.2 g) using Trizol reagent and trichloromethane. The RNA was precipitated with isopropanol and dissolved in RNasefree water. DNA contaminants were removed by treating the RNA solution with RNasefree DNase at 37°C for 30 minutes (min). First-strand cDNA was synthesized from the total RNA using a Revert Aid First Strand cDNA Synthesis Kit (Thermo Scientific) at 42°C for 60 min. The target CDS were amplified from cDNAs for cloning and transient expression. The HA sequence was used as a reverse primer to specifically amplify the CDS sequence of the transient expression gene.

Quantitative real-time PCR
Quantitative real-time PCR (qRT-PCR) was performed using SYBR Green Supermix (BIO-RAD) according to previously reported methods [17,18]. The housekeeping genes At_Actin2 (AT3G18780) in Arabidopsis, Nb_Actin3 (Niben101Scf03493g00020) in N. benthamiana, and Os_Actin (AB047313) in Oryza sativa were used as internal controls, respectively. The relative expression levels were calculated using the 2 −ΔΔCT method as described previously [19]. All qRT-PCR experiments were carried out in triplicate together with respective controls. A special forward primer was used to distinguish PLDγ3.1 and PLDγ3.2. The primer was located in an extra 90 bp intron (548-637) that was spliced out in PLDγ3.2 compared to the sequence of PLDγ3.1. In qRT-PCR, we first calculated the total relative expression of PLDγ3. The relative expression of PLDγ3.1 was calculated with a special forward primer. The relative expression of PLDγ3.2 was calculated from the difference between the total relative expression of PLDγ3 and the relative expression of PLDγ3.1.

Confocal laser scanning microscopy
A confocal laser scanning microscopy (Leica SP8) was used to visualize YFP protein expression at 40 h after transfection at a wavelength of 514 nm.

gDNAs from Arabidopsis thaliana can be correctly transcribed and spliced into mRNA in N. benthamiana
The gDNAs of seven members of the phospholipase D (PLD) family in Arabidopsis (PLDα2, PLDβ1, PLDβ2, PLDδ, PLDε, PLDγ1, and PLDγ3) were cloned from genomic DNA extracted from the leaves of Arabidopsis. Each gDNA contained 2-9 introns (Tab. S1). The gDNAs were cloned into the pEarleyGate 101 expression vector which was controlled by the constitutive expression of the CaMV 35S promoter. The YFP-HA coding sequence was fused to the 3' end of the gDNA and so only properly spliced mRNA could produce proteins with YFP-HA.
The expression constructs were transiently expressed in the leaves of N.
benthamiana. Our results showed that PLDα2, PLDβ1, PLDβ2, PLDγ3, and PLDδ were detected with an anti-HA antibody (Fig. 1A), indicating that the five gDNAs from The protein expression and the mRNA splicing results indicated that the gDNAs from Arabidopsis were correctly transcribed and spliced into the mRNA in N. benthamiana.
We analyzed the expression of these seven genes in Arabidopsis and N. benthamiana by qRT-PCR (Tab. S2). The seven native PLD genes were differentially expressed in the leaves of Arabidopsis (Fig. 1B). The expression level of PLDα2 was 500-fold lower than PLDγ1. Although PLDγ1 was expressed at the highest level of the seven genes,

Rice (Oryza sativa) is a staple crop and a model plant for research purposes that
we have used to characterize the NB-LRR gene. LOC_Os08g28460, LOC_Os08g28540, and LOC_Os08g10260 are three NB-LRR genes that were selected to study transient expression and splicing in N. benthamiana (Tab. S1). The YFP-HA tagged proteins benthamiana were 349 fold and 15,818 folds than in rice, respectively (Fig. 3B). The expression of LOC_Os08g28540 was so low that the mRNA level could not be detected in the leaves of rice, whereas its mRNA level was 2.4 fold higher than the control gene Nb_Actin3 in N. benthamiana (Fig. 3C).

Transient expression of gDNAs from other species in N. benthamiana
To further verify the application of the methods described above, we randomly selected NB-LRR genes from other species including two dicots (Brassica napus, Glycine max) and two monocots (Sorghum bicolor, Triticum aestivum) (Tab. S3).
Seven out of the nine NB-LRR genes from Brassica napus and three out of the five NB-LRR genes from Glycine max formed the predicted CDS. One out of the four NB-LRR genes from Sorghum bicolor and one out of two NB-LRR genes from Triticum aestivum formed the predicted CDSs. As long as the predicted CDSs can be isolated, the expression of corresponding proteins can be detected in N. benthamiana (data not shown). The NB-LRR genes that did not form the predicted CDSs in N. benthamiana, were cloned and sequenced from the leaves of the corresponding species and compared to the sequences extracted from N. benthamiana. Interestingly, one or two introns were not spliced in the mRNA of BnaA01g00570D, BnaA01g00670D, and Glyma.01G025400 in N. benthamiana according to the predicted CDS yet the sequencing results showed that the CDS of the three genes cloned from N. benthamiana were the same as those cloned from their species (Tab. S3). These results indicated that these forms are new spliced variants in these genes and that the splicing predictions for these genes need to be verified.
Overall, our results indicated that gDNAs from Arabidopsis thaliana, Oryza sativa, Brassica napus, and Glycine max can express correctly spliced mRNAs in a transient expression system of N. benthamiana, but gDNAs from Sorghum bicolor and Triticum aestivum do not function in this system.

The precursor messenger RNA of Arabidopsis PLDγ3 has spliced variants in N. benthamiana
There are two splice variants PLDγ3.1 and PLDγ3.2 in Arabidopsis (Fig. 4A). An extra 90 bp intron (548-637) was spliced out in PLDγ3.2 compared to the sequence of PLDγ3.1. These two variants can be determined with specific primers (Fig. 4B). We determined whether the gDNA of PLDγ3 can also form splice variants in N.
benthamiana. Our results showed that these two splice variants may also be present in N. benthamiana. We also detected the ratio of the two variants by qRT-PCR. The ratio of PLDγ3.2 to PLDγ3.1 in the leaves of N. benthamiana was 1.24 whilst the ratio was 0.5 in the leaves of Arabidopsis indicating that although Arabidopsis and N. benthamiana are both model plants, they have distinct functional differences (Fig. 4C).

Discussion
In this study, we demonstrate an attractive alternative approach for genes with CDS that are difficult to clone using conventional methods. Our approach has several distinct advantages over established systems. gDNA can be easily amplified from plant genomic DNA. CDS are usually cloned from cDNA but are dependent on the presence and abundance of the target mRNA in the extracted plant tissues. This process is challenging for genes expressed at extremely low levels or have spatial and/or temporal expression patterns in native plants (e.g. NB-LRR genes). However, the efficiency of expressing proteins with gDNA may be lower compared to CDS as the synthesis of the

Supplementary material
Tab. S1 Selected genes from Arabidopsis and Oryza sativa.
Tab. S2 Primers used for PCR and qRT-PCR.
Tab. S3 Selected genes from other dicot and monocot plants.