Complete Reconstitution and Deorphanization of the 3 MDa NOCAP (NOCardiosis-Associated Polyketide) Synthase

Several Nocardia strains associated with nocardiosis, a potentially life-threatening disease, house a nonamodular assembly-line polyketide synthase (PKS) that presumably synthesizes an unknown natural product. Here, we report the discovery and structure elucidation of the NOCAP (NOCardiosis-Associated Polyketide) aglycone by first fully reconstituting the NOCAP synthase in vitro from purified protein components followed by heterologous expression in E. coli and spectroscopic analysis of the purified products. The NOCAP aglycone has an unprecedented structure comprised of a substituted resorcylaldehyde headgroup linked to a 15-carbon tail that harbors two conjugated all-trans trienes separated by a stereogenic hydroxyl group. This report is the first example of reconstituting a trans-acyltransferase assembly-line PKS either in vitro or in E. coli, and of using these approaches to “deorphanize” a complete assembly-line PKS identified via genomic sequencing. With the NOCAP aglycone in hand, the stage is set for understanding how this PKS and associated tailoring enzymes confer an advantage to their native hosts during human Nocardia infections.

Within the past decade, genomic sequencing has exposed many "orphan" biosynthetic gene clusters encoding assembly-line PKSs whose products have yet to be identified 1 . Analysis of orphan PKSs has the potential to reveal new biosynthetic strategies as well as products with unprecedented structures and biological activities. Of particular interest to our laboratory is an intriguing family of orphan assembly-line PKSs termed NOCAP synthases. NOCAP synthases harbor cis-and trans-acyltransferases 2 , and are invariably found in strains of the actinomycete Nocardia isolated from patients affected with nocardiosis, a serious pulmonary or systemic disease [3][4][5] (Table S1). The NOCAP synthase is composed of four separate proteins (NOCAP_PKS1-4) containing nine PKS modules (eight of which are collinear) (Figure 1). Modules 1 and 3 possess their own acyltransferase domains, whereas the remaining modules require a trans-acyltransferase (tAT) to supply malonyl extender units. Notably, this PKS has three other unusual features: a) a "split and stuttering" module capable of catalyzing three elongation and reductive cycles (module 5) 5,8,9 , b) a terminal thioester reductase (TR) 10 , and c) a thioesterase (TE) domain fused to the tAT. In a preliminary study, several unprecedented, albeit partially characterized, octaketide and heptaketide products were generated by incubating purified modules 4-8 with the surrogate primer unit octanoyl-CoA 5 . Building on our laboratory's prior experience in functionally reconstituting the complete 6-deoxyerythronolide B synthase (DEBS) in E. coli 6 and from purified protein components 7 , we sought to deorphanize a prototypical NOCAP synthase outside of its genetically difficult and potentially hazard-ous natural host using both of these approaches. Here we report on the successful reconstitution of the entire assembly-line NOCAP synthase in vitro as well as in E. coli.
We hypothesized that the uncharacterized module "X" synthesizes a primer unit for the collinear assembly line comprised of modules 1-8. Accordingly, we expressed a soluble maltose-binding protein (MBP)-module X fusion protein in E. coli, and purified it to homogeneity ( Figure  S1). To identify the substrates and products bound to its acyl carrier protein (ACP) domain by intact protein LC-MS, we further expressed and purified two derivatives of this protein: MBP-module X without the ACP domain (KSX), and the stand-alone ACPX ( Figure S2). Apo-ACPX was incubated with the Sfp phosphopantetheinyl transferase 11 and malonyl-CoA to obtain malonyl-S-ACPX, which was then incubated with KSX. LC-MS analysis revealed that malonyl-S-ACPX was predominantly decarboxylated to acetyl-S-ACPX in a KSX-dependent manner (Figure 2a), suggesting that KSX is condensationincompetent (hence the designation KS0 from here onwards) but is able to decarboxylate malonyl-S-ACPX to generate an acetyl unit for translocation to module 1. Interestingly, while KSX appears functionally analogous to specialized KSQ domains 12 , its active site Cys residue is not replaced by a highly conserved Gln. KS0 domains are prevalent in trans-AT PKSs, and lack the active site His residue located within the HGTGT motif 2 ; however, KSX also retains this conserved His residue. To assay the overall activity of modules X, 1 and 2, a bimodular protein (KS1-AT1-DH1-KR1-ACP1-KS2-DH2-ER2-KR2) lacking ACP2 was expressed and purified; separately, stand-alone holo-ACP2 was also expressed and purified ( Figure S3). The two proteins were assayed via a phosphopantetheine (PPant) ejection assay 13 in the presence of MBP-module X, truncated tAT (i.e., lacking its TE domain) and appropriate substrates. In this and all subsequent assays utilizing malonyl-CoA, this labile substrate was generated in situ by adding malonic acid, CoASH, ATP and the Streptomyces coelicolor malonyl-CoA synthetase MatB 14 to the reaction mixture. Instead of detecting the anticipated hex-4-enoyl-PPant species, sorbyl-PPant was the major observed product (Figure 2b), implying that the enoylreductase (ER) domain of module 2 is inactive (designated ER0 from here onwards). Together, these results confirm our hypothesis that module Xmodule 1-module 2 comprise the first three modules for initiation of NOCAP biosynthesis.
We hypothesized that the TE domain of the tAT-TE protein hydrolyzes acyl-ACPs under conditions of "stalled" polyketide biosynthesis 15 . This suggestion was consistent with our earlier observation that use of the truncated tAT in place of the full-length tAT-TE resulted in a two-toten-fold decrease in product formation 5 . To test this hypothesis, we incubated Sfp-derived acetyl-S-ACP1 -a stalled acyl-ACP surrogate -with either tAT-TE or tAT. LC-MS analysis uncovered that acetyl-S-ACP1 was hydrolyzed to holo-ACP1 in the presence of tAT-TE but not tAT (Figures 2c, S4). Analogous radiolabeling experiments further verified the above findings ( Figure S5). Together, these results provide strong evidence that this TE is a member of the "TEII" sub-family of thioesterases (designated TEII hereafter) that acts as a proofreading enzyme by hydrolyzing unproductive intermediates.
Buoyed by the reconstitution of modules X, 1 and 2, we endeavored to reconstitute in vitro the complete NOCAP synthase. To overcome its exceptionally large size (the synthase's homodimeric mass approaches 3 MDa), NOCAP_PKS1 was expressed and purified as three standalone proteins: modules 1 and 2 as one bimodular protein, module 3 as a unimodular protein, and module 4 along with the KS domain of module 5 (module 4-KS5) as the third protein. Separately, NOCAP_PKS2 was dissociated into two proteins: the DH-ACP-KR tridomain of module 5 fused to the complete module 6 (DH5-ACP5-KR5-module 6), and a bimodular protein comprised of modules 7 and 8 along with the terminal TR domain (modules 7-8-TR) (Figures 3a, S1). To facilitate intermodular chain translocation between separated modules, each protein was fused to complementary N-terminal and/or C-terminal docking domains from DEBS that have previously been shown to facilitate non-covalent interactions between successive modules on a PKS assembly line 16,17 . These five NOCAP synthase-derived proteins were mixed with MBPmodule X, tAT-TEII, malonyl-CoA, NADPH and Sadenosyl methionine. To confirm that products originated from the assembly-line PKS,  The observation of +11, +11 and +22 mass shifts for 1 in mixtures containing [2-13 C], [1,3-13 C2] and [ 13 C3] malonyl-CoA, respectively, indicated that 1 traversed the entire polyketide synthase and underwent three rounds of chain elongation, ketoreduction and dehydration at module 5 (Figures 1, S6-7). Because biosynthesis of 1 requires the entire assembly-line PKS, we propose that 1 is the aglycone product of the NOCAP synthase. A closely related polyketide 2 was detected that had presumably undergone one fewer round of chain elongation, ketoreduction and dehydration than 1 (molecular formula C21H24O4, ob-  (Figures S8-10). Its MS/MS fragmentation pattern matched well with that of 1, leading us to hypothesize that an upstream module was "skipped" during the biosynthesis of 2. Two more minor polyketides 3 and 4 were identified with MS/MS fragmentation patterns noticeably different than 1 and 2 (Figures S11-16). 3 and 4 are most likely premature polyketides with a pyrone moiety that originated from spontaneous release after module 7 extension and C-1-C-5 oxygen lactonization.
For definitive structural analysis, we sought to produce the NOCAP synthase products by using E. coli as a heterologous host for scalable polyketide biosynthesis 6 . Informed by the in vitro reconstitution experiments summarized above, we engineered three plasmids with compatible antibiotic resistance markers and origins of replication that collectively encode the pathway. Plasmid pCK-KPY222 encoded modules 1-2 as one bimodular protein and module 3. Plasmid pCK-KPY259 encoded module 4-KS5 and the intact NOCAP_PKS2, and pCK-KPY178 encoded the tAT-TEII and MBP-module X proteins (Figures  3b, S17). To enhance the malonyl-CoA pool in E. coli, pCK-KPY178 also encodes MatB and the Rhizobium leguminosarum malonate carrier protein MatC 18,19 . Gratifyingly, E. coli BAP1[pCK-KPY222/pCK-KPY259/pCK-KPY178] produced 1 and 2 (Figures 3c, S18-23). E. coli-derived 1 and 2 had the same MS/MS fragments as 1 and 2 produced in vitro. Derivatization of 1 and 2 with Girard's reagent T 20 confirmed the presence of an aldehyde (Figures S24-27). 3 and 4 were much lower in abundance from extracts of this strain, suggesting that these metabolites do not arise under physiological conditions, but are only synthesized under conditions with excess substrates. Because of their scarcity in E. coli, 3 and 4 were not further characterized.
We isolated 1 and 2 as faint yellow solids from 4 L of E. coli BAP1[pCK-KPY222/pCK-KPY259/pCK-KPY178] using lipid extraction with methyl tert-butyl ether/methanol 21 , C18 solid-phase extraction and UV-absorbance-guided semi-preparative HPLC (Figures S28-30). A number of 1D-and 2D-NMR experiments ( 1 H, COSY, TOCSY, HSQC, HMBC, NOESY and ROESY) allowed us to fully elucidate their chemical structures (Figures 4, S31-62; Table S2).  To determine the absolute configuration of the stereocenter set by the KR domain of module 4, we converted the C-15 hydroxyl substituent of 2 to a Mosher ester 23 . Mosher ester analysis with COSY confirmed that the absolute configuration at C-15 is R, also as predicted by bioinformatic analysis 22 (Figures S63-66). Unlike 1, compound 2 featured a conjugated diene -not triene -in its "tail". We therefore hypothesized that a combination of the dissociated-by-design nature of module 3 and the broad substrate tolerance of the KS domain of module 4 5 permitted facile chain translocation of the growing polyketide chain from module 2 to module 4. Indeed, E. coli that does not express module 3 only produced 2, substantiating our hypothesis that the biosynthesis of 2 involves bypassing module 3 ( Figure S67). Collectively, these spectroscopic efforts validated 1 as the aglycone product of the NOCAP synthase.
This report represents two milestones. First, we describe for the first time the full reconstitution (both in vitro as well as in E. coli) of an assembly-line PKS that is predominantly comprised of trans-AT modules. trans-AT PKSs represent over 23% of all sequenced assembly-line PKSs according to a recent survey 24 and display remarkable architectural diversity; however, the understanding of trans-AT PKSs has significantly lagged that of cis-AT PKSs 2 . Based on this report, the NOCAP synthase is a strong contender as a model trans-AT PKS in the same vein as DEBS has been for cis-AT PKSs. Second, this work concludes the first example of natural product discovery by reconstituting orphan assembly-line PKSs in vitro or in E. coli. In principle, the methodology described here could be applied to other orphan polyketide natural products, especially those synthesized in low abundance or from unculturable organisms 25 . The discovery and structure elucidation of 1 will also allow us to turn our attention to the tailoring enzymes clustered with the NOCAP synthase genes and the ultimate characterization of the biological role of its fully decorated natural product. Such efforts are compellingly motivated by the statistically significant but nonetheless correlative occurrence of this PKS only in strains associated with clinical cases of nocardiosis.