Cell-free synthesis of natural compounds from genomic DNA of biosynthetic gene clusters

A variety of chemicals can be produced in a living host cell via optimized and engineered biosynthetic pathways. Despite the successes, pathway engineering remains demanding and partly impossible owing to the lack of specific functions or substrates in the host cell, its sensitivity in vital physiological processes to the heterologous components, or constrained mass transfer across the membrane. In this study, we demonstrate that cell-free systems can be useful in driving the characterization and engineering of biosynthetic pathways. We show that complex multidomain proteins involved in natural compound biosynthesis can be produced from encoding DNA in vitro in a minimal complex PURE system to directly run multistep reactions. We prove the concept of this approach on the direct synthesis of indigoidine and rhabdopeptides with the in vitro produced multidomain megasynthases BpsA and KJ12ABC. The in vitro produced proteins are analyzed in detail, i.e., in yield, quality, post-translational modification and specific activity, and compared to recombinantly produced proteins. Our study highlights cell-free PURE systems as suitable setting for the rapid engineering of biosynthetic pathways.


Introduction
Genome mining has become the main driver in the discovery of new natural products [1][2] .However, while the "orphan clusters" become available by the advent of next generation sequencing technologies and bioinformatics tools, the substances can often neither be identified nor extracted under laboratory conditions.Cultivation of the native producer strain in the laboratory is of limited success, because the native conditions cannot be reconstituted in the lab 3 , and robust heterologous production of the compound in host organisms fails owing to complex native regulation and possible toxic products [4][5] .
Concomitantly, the proteins encoded by gene clusters escape enzymatic characterization, catalytic mechanisms remain unsolved and protein engineering for expanding the product spectra to novel new-to-nature compounds remains unachievable.
Facing this problem, we sought to establish a cell-free/ex vivo approach based on the PURE system for analyzing and engineering biosynthetic gene clusters.Such an approach is beneficial for the analysis and design of biosynthetic pathways for mainly three reasons: First, it prevents the biosynthetic pathway from interfering with physiological processes of a host cell, second, it is of minimal complexity allowing for simple and direct read-out of enzymatic properties, and, third, it is an open system that is not constrained to canonical compounds (substrates, cofactors, or assisting and tailoring enzymes).Overall, a minimal complex cell-free system could improve access to the key information about native and engineered pathways.In evaluating the PURE system for the cell-free synthesis of natural compounds from genomic DNA, we worked with the commercially available E. coli-based PURExpress In Vitro Protein Synthesis (IVPS) Kit as a "reaction solution" for gene expression and product formation (New England Biolabs, USA) 6 .We posited that the E. coli-based system provides a suited setting, because a multitude of proteins from biosynthetic gene clusters were successfully produced in E. coli [7][8] .
In our approach, we specifically focused on the protein class of megasynthases, which are responsible for the synthesis of non-ribosomal peptides (NRPs) and polyketides (PKs) [9][10][11] .The megasynthases, termed NRP synthetases (NRPSs) and PK synthases (PKSs), are organized in modules comprising integrated enzymatic domains that catalyze the individual reaction steps in biosynthesis [12][13] .PKSs and NRPSs can occur as monomodular or multimodular systems, while the genes encoding multimodular megasynthases typically cluster in the genome.Intriguingly, there is co-linearity between the order of the genes encoding modules of a multimodular megasynthase, the sequence of the reaction steps assembling the compound, and the identity of the product.Megasynthase gene clusters are prototypical candidates for identification by genome mining, because their inherent modularity leads to gene clusters with repeating patterns.Further, the paradigm of co-linearity leads to accurate predictions about the identity of the compound [14][15] , which has put megasynthases at the forefront of engineering machineries for the programmable multistep synthesis of novel compounds with new bioactivities [16][17][18] .Due to the relevance of PKs and NRPs as pharmaceuticals, a simple and fast access to the products of native and engineered megasynthases is highly demanded.
In this report, we demonstrate the cell-free production of the NRPS BpsA from Streptomyces lavendulae [19][20] and the RXP (rhabdopeptide-like peptide) producing NRPS KJ12ABC from Xenorhabdus KJ12.1 21 in the PURE system, as well as the direct production of their natural products indigoidine and rhabdopeptides, respectively (Figure 1A & B).We further show that other megasynthases (including the PKS-related fatty acid synthases (FASs)) can be produced and activated by post-translational modification.The successful translation of the selected genes into natural products proves the general applicability of PURE systems as cell-free platform for studying megasynthases and biosynthesis of their natural products.The presented approach can be directly applied to other proteins involved in natural compound synthesis.Valine is the preferred substrate of both elongation modules KJ12A and KJ12B.The second module KJ12B is iteratively used in this assembly line with a relaxed methyltransferase (MT) activity.KJ12ABC produces a broad spectrum of compounds 21 .

Results & Discussion
To establish an in vitro platform for megasynthase production and biosynthesis, the monomodular NRPS protein BpsA was produced in vitro using the commercially available PURExpress In Vitro Protein Synthesis Kit (New England Biolabs, USA) 6 .This IVPS system is reconstituted from pure components in defined concentrations and based on the E. coli transcription/translation machinery.BpsA served as main model system in our study, because it is a monomodular NRPS and produces the blue colored pigment indigoidine that allows the direct monitoring of product synthesis by spectroscopic means.Full-length BpsA (Uniprot number Q1MWN4, N-terminal Strep-Tag 23 ) was successfully synthesized in vitro, as confirmed by correct molecular weight bands in SDS-PAGE and by Western blotting (Figure 2A).Size exclusion chromatography (SEC) showed that in vitro synthesized BpsA elutes at identical apparent mass as the recombinantly produced reference (Figure 2B).Post-translational phosphopantetheinylation of the thiolation (T) domain, an obligatory modification for enabling substrate shuttling, was performed by the promiscuous 4'phosphopantetheinyl transferase enzyme Sfp from Bacillus subtilis and coenzyme A (CoA) [24][25] , which were both added to the solution.Protein yields after 2 h of synthesis at 30 °C were in average 25.9 ± 4.0 µg/mL, when the protein was first synthesized and then phosphopantetheinylated (sequential reaction), and 15.3 ± 2.4 µg/mL, when the protein was phosphopantetheinylated while synthesized (parallel reaction) (Figure 2C and Figure S1).We have not further investigated the origin of the difference in expression yields in the sequential and the parallel protocol.
The susceptibility of in vitro synthesized BpsA for phosphopantetheinylation was determined by labeling with the fluorescent CoA-647 and in-gel fluorescence read-out 26 .We assumed that the degree of labeling (DOL) with CoA-647 is a good measure for the phosphopantetheinylation efficiency, because Sfp is tolerant for CoA and CoAderivatives [27][28] .We found DOLs of 108.1 ± 8.3 % and 127.2 ± 8.4 % for the sequential and the parallel protocol, respectively, when compared to recombinantly produced BpsA as reference (see Figure 2C).DOLs for the in vitro synthesized BpsA exceeding 100 % may be explained by the in vivo phosphopantetheinylation of a small fraction of the recombinantly produced BpsA that was used as a reference.The phosphopantetheinylated fraction is withdrawn from CoA-647 labeling, leading to the systematic overestimation of DOLs of cell-free produced BpsA.Photorhabdus luminescens was also expressed and phosphopantetheinylated in vitro to biosynthesize indigoidine 29 .However, a quantitative analysis was performed with BpsA only (Figure S4C).In the light of the successful proof of concept, a set of megasynthases (NRPSs, PKSs, FASs) was screened for synthesis by the PURE cell-free system with coupled phosphopantetheinylation (sequential protocol, see Figure S1).We were able to synthesize and CoA-647-label all megasynthases applied in this screen, Penicillium patulum MSAS 30 , RAPS module 14 31 , PikA module 5 (PikAIII) 32 , DEBS module 4 7 , three variants of murine FAS (wild-type, KS-MAT-ACP-TE and with C-terminal GFP (FAS-GFP)) [33][34] , GrsA 35 , TycB1 36 , and the Xhenorhabdus KJ12.1 KJ12ABC 21 (Figure S5A & B).
The murine FAS construct DH-KR-TE and the NRPS module KJ12C, which do not harbor a carrier protein domain, were not fluorescently labeled, as expected (Figure S5B).The successful phosphopantetheinylation suggests that proteins are correctly folded.
In seeking to probe the PURE cell-free system for the production of another megasynthase, we decided to work with the RXP-synthesizing NRPS KJ12ABC from the bacterium Xhenorhabdus KJ12.1 21 .The RXP biosynthetic gene cluster of Xhenorhabdus KJ12.1 encodes three NRPS modules -kj12A encoding a C-A-T module (137.8 kDa), kj12B encoding a C-A/MT-T module (181.5 kDa), and kj12C encoding a stand-alone Cterm domain (62.3 kDa) (see Figure 1B).The RXP-synthesizing NRPS represented the most complex megasynthase system used in our study, and the individual proteins were available in good yields as identified in our screen (see Figure S5B).As a reference for the analysis of KJ12ABC form the PURE cell-free system, modules were recombinantly expressed in E. coli and subjected to a product synthesis assay (Figure S6).Specifically, we focused on KJ12B, which was available in second highest yields after the indigoidineproducing monomodular BpsA and IndC (see Figure S5B).The production yield of modules KJ12B and C were estimated from densitometric analysis.The degree of phosphopantetheinylation for KJ12B was determined with the fluorescent analogue CoA-647 and in-gel fluorescence intensities, similarly as performed with BpsA.In order to avoid a compromised activity of the in vitro synthesized proteins in the PURExpress reaction solution, as observed for BpsA, we reverse-purified KJ12B and C by Nichelating chromatography.Since all proteins from the PURE system (except for the ribosomes) are His-tagged, the Strep-tagged in vitro synthesized NRPS modules appear in the flow through, allowing for rapid purifications (Figure 4A).Module KJ12A was added in excess as recombinantly produced and purified protein.We expected that elevated concentrations of the first, chain-initiating module KJ12A shifts the output spectrum to shorter peptides (with less methylated valine residues) at high abundance.
Under the reaction conditions, which corresponds to a molar ratio of 10:1:1 (KJ12A:B:C) for the modules, we eventually identified three peptides; the tetrapeptide mV-V-mV-mV-PEA (RXP number 3, see Figure S6) in highest abundance and also two other tetrapeptides of sequences V-V-V-mV-PEA (9) and V-mV-V-mV-PEA (10), respectively (Figure 4B & Figure S7).Since KJ12A alone and in combination with KJ12C cannot produce those peptides (see Figure S6) our data reveal the functionality of the in vitro synthesized components KJ12B and C, and underline the suitability of the PURE system as IVPS platform for megasynthase production and analysis.In conclusion, we demonstrate that the PURE system can provide a superior setting for the analysis and engineering of biosynthetic pathways, which is not possible with cellfree synthesis in E. coli cell extract 20 .We show that IVPS can pave the way towards a rapid cell-free screening platform for natural compounds discovery: Genome mining identifies the (megasynthase) gene cluster à gene synthesis provides the DNA à the proteins are produced in the PURE system à enzymatic properties and product output are directly monitored.In spite of the advantages of IVPS by the PURE system, its broad application hinges on solutions to several key challenges.Providing the protein in sufficient yields and quality may be the biggest current limitation: In probing the synthesis of several megasynthases, we observe different yields (see Figure S5A & B), and the in-depth analysis of in vitro BpsA production indicates that the quality of the protein is slightly compromised (see Figure 3A & B).Yields and quality of proteins may further be improved by optimizing the encoding DNA in untranslated and translated regions, e.g., in the secondary structure of DNA/RNA or in codon usage [37][38] .A broader reconstitution of the protein quality control machinery is generally possible, but will need to be balanced against the benefit of a low complex environment for controlling and monitoring reaction progress and output.In this respect, the capability of E. coli to produce an array of megasynthases is good news for the applicability of the E. coli based PURE system 6 .It implies that the folding and assembly of megasynthases do generally not rely on specific factors of original producer strains.A further limitation arises from the current high costs for IVPS, particularly of systems reconstituted from pure components.However, recent work promises that the access to PURE cell-free systems will improve and become less costly in the near future 39 .

Recombinant production of of BpsA in vivo:
The bpsA gene with N-term Strep-tag was placed in a pET22b plasmid with amp resistance.BpsA was expressed in BL21 Gold production (KJ12A and KJ12B), followed by incubation at 37 °C with shaking at 180 rpm.
After the cultures grew to an OD600 of 0.5-0.7,they were kept growing for 72 h at 180 rpm and 20 °C.The cells were harvested by centrifugation (10,000 rpm, 10 min, Accordingly, the molar ratio of modules in the reaction solution was approximately 10:1:1 (A:B:C).The reaction was developed and prepared for MS-analysis as described below.
HPLC-MS analysis of RXP production: 5 μl of the crude extracts were injected and analyzed via ESI-HPLC-MS by a Dionex UltiMate 3000 HPLC system coupled to a Bruker AmaZon X mass spectrometer with a ACQUITY UPLC™ BEH C18 column (130 Å, 2.1 mm × 100 mm, 1.7 μm particle size, Waters GmbH) at a flow rate of 0.6 mL/min for 16 min, using acetonitrile and water supplemented with 0.1 % formic acid (v/v) in a gradient ranging from 5 % to 95 % of acetonitrile (ACN).For RXPs detection positive mode with scanning range from 100-1200 m/z and UV at 200-600 nm was used.The software DataAnalysis 4.3 (Bruker) was used to evaluate the generated HPLC-MS measurements.

Figure 1 .
Figure 1.NRPS-mediated indigoidine and rhabdopeptide synthesis.NRPS modules compose three core domains: a condensation (C) domain, an adenylation (A) domain, and a thiolation (T) domain.Additional domains release the compound and may be present for further processing.

( A )
BpsA-mediated indigoidine biosynthesis.The A domain incorporates L-glutamine, which then undergoes internal cyclization.The product is released by a thioesterase (TE) domain.The oxidation (Ox) domain presumably oxidizes the cyclized product.Further oxidation dimerizes the intermediate to form the blue pigment indigoidine 22 .(B) The RXP-producing NRPS from Xhenorhabdus KJ12.1, termed KJ12ABC.The RXP biosynthetic gene cluster encodes three NRPS modules.The stand-alone Cterm domain uses phenylethylamine (PEA) for peptide chain release.

Figure 2 .
Figure 2. Synthesis of BpsA with the PURE cell-free system.(A) Expression control by Western blotting with anti-Strep antibodies performed in three independent reaction solutions (#1-3).BpsA was applied as holo-protein, produced by IVPS with simultaneous phosphopantetheinylation.Self-cast 9 % Tris-Tricine gel.Strep-tagged BpsA has a molecular weight of 142.7 kDa.For the uncropped blot, see Figure S2A.(B) SEC profiles and Western Blot detection of elution fractions.(top) Recombinantly produced BpsA and (bottom) IVPS reaction solution including phosphopantetheinylation. (C) Quantification of protein production yields and phosphopantetheinylation efficiency.BpsA was first produced by IVPS and then phosphopantetheinylated with Sfp and CoA-647 (purchased from NEB).Samples from three independent reactions (#1-3) were applied in repetition (a & b).For calibration, recombinantly produced BpsA, diluted in the PURExpress reaction solution, was loaded in amounts of 1.25, 0.63, 0.31 and 0.16 pmol.9 % Tris-Tricine gel as in panel A. For the uncropped gels, see Figure S2B.Overall, three times three reactions, each applied in duplicate (18 bands), were used for quantification of BpsA production and phosphopantetheinylation for the parallel and the sequential protocol, respectively (FigureS3 A-C).

Figure 3 .
Figure 3. Cell-free synthesis of indigoidine.(A) Production of indigoidine spectroscopically followed at 600 nm and compared to turnover rates of each protein preparation (see legend).Data shows one of four independent experiments (for more data, see FigureS4A).(B) Turnover rates were calculated taking the maximal slope of product formation rates (curves shown in panel A).Data has been collected on four independent protein preparations.The error bars represent standard deviation (n = 4) of maximal rates.

Figure 4 .
Figure 4. Cell-free synthesis of rhabdopeptides.(A) SDS-PAGE of IVPS of KJ12B and KJ12C.Composition of reaction solutions as indicated.(B) Specific extracted-ion chromatograms (EICs) of the different HPLC-MS analyses color coded as outlined in (A).The structures of detected rhabdopeptides are attached.For MS-MS fragmentation data of compounds 3, 9 and 10, seeFigure S7.

4 1 ( 2
°C).For purification, cell pellets were resuspended in 50-100 mL Strep-tag binding buffer (100 mM Tris-HCl, 150 mM NaCl, pH 8.0, sterile filtrated) supplemented with one protease inhibitor tablet (Roche), 0.1 % of Triton X-100, 0.5 mg/mL lysozyme (10 U/mL) und 3 μL Benzonase Nuklease ® (25 U/μL) and incubated for 30 min at rt and additionally lysed by sonication.Cell debris was removed from the lysate by centrifugation (20,000 rpm, 30 min, 4 °C) and the lysate was loaded onto a StrepTrap HP 5 mL column (GE Healthcare) and purified with the ÄKTA TM purifier system (GE Healthcare) or NGC system (BioRad).50-100 mL protein lysate was loaded onto the column, which was equilibrated with two column volumes (CV) of Strep-tag binding buffer (100 mM Tris-HCl, 150 mM NaCl, pH 8.0, sterile filtrated), with a flow rate of 2.5 mL/min and fraction collection of 12 mL.The column was washed with 12 CV Strep-tag binding buffer with a flow rate of 5 mL/min to wash away unspecifically bound proteins.Protein of interest was eluted with 8 CV of Strep-tag elution buffer (100 mM Tris-HCl, 150 mM NaCl, 2.5 mM desthiobiotin, pH 8.0, sterile filtrated) with a flow rate of 5 mL/min and fractions of 5 mL were collected.The column was regenerated with 3 CV regeneration buffer (Strep-tag binding buffer with 1 mM HABA, pH 8.0; sterile filtrated) and additional 5 CV Strep-tag binding buffer.The pooled fractions containing the protein of interest were pooled and concentrated with Centriprep units, MWCO= 50 kDa (Merck Millipore) for KJ12A and KJ12B or MWCO = 30 kDa (Merck Millipore) for KJ12C.In vitro protein synthesis and phosphopantetheinylation: IVPS was performed using PURExpress In Vitro Protein Synthesis Kit (NEB) with supplements (16 U RNase inhibitor (NEB), 5 mM FMN).50-100 ng DNA template was provided in 25 µL reaction volume and incubated for 2 h at 30 °C.The reaction was stopped on ice, adding 50 µg/mL kanamycin as ribosome inhibitor.The IVPS reaction could also be scaled up to 100 µL.Phosphopantetheinylation was either performed in parallel to the IVPS or after IVPS by adding Sfp (final concentration 0.5 µM) and CoA (final concentration 100 µM) to the reaction.For simultaneous phosphopantetheinylation (parallel protocol) the reaction mixture was incubated another 15 min at 30 °C after ribosome inhibition by kanamycin (final concentration of 50 µg/ml, for subsequent phosphopantetheinylation for 1 h at 30 °C.The reaction was stopped on ice.Phosphopantetheinylation was quantified from in-gel fluorescence intensities, using fluorescent CoA 647 (NEB) as substrate.Protein expression yields were quantified from SDS-PAGE by densitometric analysis using unfolded in vivo expressed and purified apo-BpsA as reference.Western blots were treated with antibodies against the Strep-tag and analyzed by fluorescence detection.As primary antibody StrepMAB-Classic from mouse (IBA), and as secondary antibody Donkey anti-Mouse IgG DyLight 755 conjugate (Thermo Fisher Scientific) were used.Indigoidine synthesis: Cell-free synthesis of indigoidine synthesis was performed by adding 0.65 mM ATP and L-glutamine and 0.8 mM MgCl2 to the phosphopantetheinylated IVPS mixture and incubating for 1 h at RT.The synthesis of indigoidine was monitored measuring the absorbance at 600 nm over time, in cuvettes at the NanoDrop (Thermo Fisher Scientific) or in 384 well plates at the CLARIOstar (BMG).Turnover rates were determined by calculating the point of maximal slope (inflection point) of the sigmoidal curves by the second derivation and normalized to protein concentration.RXP synthesis: For in vitro reactions, 10 mM MgCl2, 5 mM L-valine, 2 mM S-adenosylmethionine (SAM) and 3 mM PEA were first added in 1.5 mL reaction tubes.The proteins KJ12A, KJ12B and KJ12C with ratios of KJ12A:KJ12C 1:1 (2 μM/2 μM), KJ12A:KJ12B 1:1 (2 μM/2 μM), KJ12B:KJ12C 1:10 (2 μM/20 μM), 1:1 (2 μM/2 μM), and 10:1 (20 μM/2 μM), as well as KJ12A:KJ12B:KJ12C 1:1:1 (2 μM/2 μM/2 μM/) and 1:10:μM/20 μM/2 μM/) were then separately added into the above mentioned reaction tubes to check the RXP profiles.Finally, the total volume was adjusted to 400 µL with assay buffer (50 mM Tris-HCl, 50 mM NaCl, pH 8.0) and the reaction was started by addition of 3 mM ATP.The negative controls, one without all enzymes and the other with KJ12A, KJ12B or KJ12C alone were performed in the same way.All reaction tubes were incubated at 22.5 °C, 250 rpm for 20 h.After incubation, the samples were extracted with one volume of methanol for 1 h at rt under shaking and measured with the HPLC-MS analysis.Cell-free synthesis was performed in reaction volumes of 125 µL.Phosphopantetheinylation was performed in sequential manner.By taking a PUREexpress protein band as reference, determined in concentration before with the BpsA calibration curve, the concentrations of KJ12B and KJ12C were determined to 0.02 and 0.015 µM, respectively.After cell-free synthesis, reaction solutions for modules KJ12B and KJ12C were combined and inversely purified with Ni-NTA Magnetic Beads (Thermo Fisher) using assay buffer containing 30 mM imidazole and 0.05 % Tween-20.Strep-tagged NRPS modules were collected in the flow through.Elution of His-tagged proteins from the reaction solution was performed with assay buffer containing 200 mM Imidazole and 0.05 % Tween-20.The elution fraction served as negative control.For cell-free synthesis, recombinantly produced KJ12A was added to the inversely purified combined reaction solution of KJ12B and C at a final concentration of 0.16 µM.