Experimental tests of functional molecular regeneration via a standard framework for coordinating synthetic cell building

The construction of synthetic cells from lifeless ensembles of molecules is expected to require integration of hundreds of genetically-encoded functions whose collective capacities enable self-reproduction in simple environments. To date the regenerative capacities of various life-essential functions tend to be evaluated on an ad hoc basis, with only a handful of functions tested at once and only successful results typically reported. Here, we develop a framework for systematically evaluating the capacity of a system to remake itself. Using the cell-free Protein synthesis Using Recombinant Elements (PURE) as a model system we apply our framework to evaluate the capacity of PURE, whose composition is completely known, to remake 36 life-essential functions. We find that only 23 of the components can be well tested and that only 19 of the 23 can be remade by the system itself; translation release factors remade by PURE are not fully functional. From both a qualitative and quantitative perspective PURE alone cannot remake PURE. We represent our findings via a standard visual form we call the Pureiodic Table that serves as a tool for tracking which life-essential functions can work together in remaking one another and what functions remain to be remade. We curate and represent all available data to create an expanded Pureiodic Table in support of collective coordination among ongoing but independent synthetic cell building efforts. The history of science and technology teaches us that how we organize ourselves will impact how we organize our cells, and vice versa.


Abstract: 25
The construction of synthetic cells from lifeless ensembles of molecules is expected to require 26 integration of hundreds of genetically-encoded functions whose collective capacities enable self-27 reproduction in simple environments. To date the regenerative capacities of various life-28 essential functions tend to be evaluated on an ad hoc basis, with only a handful of functions 29 tested at once and only successful results typically reported. Here, we develop a framework for 30 systematically evaluating the capacity of a system to remake itself. Using the cell-free Protein 31 synthesis Using Recombinant Elements (PURE) as a model system we apply our framework to 32 evaluate the capacity of PURE, whose composition is completely known, to remake 36 life-33 essential functions. We find that only 23 of the components can be well tested and that only 19 34 of the 23 can be remade by the system itself; translation release factors remade by PURE are not 35 fully functional. From both a qualitative and quantitative perspective PURE alone cannot 36 remake PURE. We represent our findings via a standard visual form we call the Pureiodic Table  37 that serves as a tool for tracking which life-essential functions can work together in remaking 38 one another and what functions remain to be remade. We curate and represent all available data 39 to create an expanded Pureiodic Table in support of collective coordination among ongoing but 40 independent synthetic cell building efforts. The history of science and technology teaches us that 41 how we organize ourselves will impact how we organize our cells, and vice versa. 42 43 [246 words] 44 45

Introduction
From a first-principles perspective the engineering of physical systems capable of self-81 reproduction [14-15] must satisfy two criteria (Fig. 1C). First, qualitatively, the system enacting 82 the instructions for reproducing the system must be capable of producing a functional copy of the 83 system at both the level of individual components and as an integrated whole. Second, 84 quantitatively, the system must possess sufficient generative capacity to make or organize at least 85 as much material as needed to reproduce. For bottom-up synthetic cell building efforts the 86 qualitative criteria means that the DNA encoding the system, when expressed in the environment 87 defined within synthetic cell alone, must result in fully functional molecules for all so-encoded 88 molecules. Next, satisfying the quantitative criteria means, for example, that the total number of 89 peptide bonds used to instantiate proteins comprising the system can be catalyzed by the system 90 within a single reproduction cycle. 91 92 The Protein synthesis Using Recombinant Elements (PURE) system enables transcription and 93 translation of user-defined DNA [16]. PURE itself is well established as a research tool and 94 benefits from reliable commercial and informal supply chains [17][18]. Many have imagined 95 PURE or its analogs as a compelling biomolecular foundation upon which to build synthetic cells 96

. Can we build cells from lifeless ensembles of independently-sourced natural biomolecules? (a)
We have the capacity to source, encapsulate, and encode the molecules thought to be essential for cellular functions. (b) Success would result in the capacity to produce autonomous reproducing simple cells comprised only of known components. (c) However, success requires that two abstract conditions be met. First, the ensemble itself must be capable of regenerating the functionality of all components comprising the ensemble (i.e., qualitative reproduction). Second, given energy and materials, the ensemble must be capable of remaking more of itself (i.e., quantitative growth). We started by purifying all 20 tRNA synthetases (aaRSs) from E. coli (Fig. S2). We then tested 151 whether the purified synthetases were functional by adding them, all together, to the PURE 152 Solution B that lacked its own aaRS set and measuring expression of a green fluorescent protein 153 (GFP) (Fig. S4A, Fig. S3, red curve). We chose GFP as a simple, easily accessible reporter for 154 our study (Table S1). PURE Solution B lacking any aaRSs produced almost no GFP whereas 155 GFP was well-expressed by PURE Solution B containing commercially supplied aaRSs ( Fig.  156 S3). We observed that the expression profile of PURE using our newly-purified aaRSs nearly 157 matched those of the commercial mixture, suggesting that our so-purified aaRSs were all 158 functional and could be used to construct each of the 20 aaRS single-depletion PURE mixtures.

160
To construct the remaining 16 single-enzyme depletion PURE variants we sourced six different 161 custom PURE Solution B kits (NEB), each with a different subset of missing enzymes (Fig. S4).

162
We then made all remaining single-enzyme depletion PURE variants by supplementing each 163 depletion subset with the appropriate combinations of enzymes that we had independently 164 purified from E. coli or sourced commercially (Methods).

166
We next measured GFP expression capacity for each single-component depletion (SCD) (Fig. 2 general, and our reporter gene (GFP) requires at least one of each of the 20 standard amino acids 170 (Table S1). For example, PURED(AlaRS) (i.e., PURE lacking the alanine tRNA synthetase) 171 produced ~98% less in GFP expression relative to intact PURE (Fig. 2AB). However, we 172 observed that each SCD produced a different level of GFP expression, from no change to almost 173 complete loss of expression (Fig. 2C).

175
We used an expression-loss threshold of 75 percent full PURE activity to select which SCDs to 176 advance for use in testing if PURE-made components can reconstitute PURE. We selected this 177 threshold empirically to provide sufficient dynamic range in our subsequent reconstitution 178 experiments; we expect that optimized reporters and improved measurement methods will be 179 helpful evaluating the thirteen components we did not consider further here (nine aaRSs, IF1,  reactions, allowing us to more directly discriminate between any differences in expression 200 arising via differences in functional activity of components due to source (i.e., PURE or intact 201 cells), versus differences in expression arising merely due to differences in the concentration of To represent if and to what extent a system can remake itself at a single-component level we 211 developed standard quantitative metrics that can be used for any component whose function can 212 be transduced to expression of a reporter gene. Specifically, we used the levels of GFP 213 expression from each SCD and PMC-complemented SCD relative to the GFP expression level 214 obtained via intact PURE to define depletion and recovery scores, respectively (Fig. 4A). We 215 used a split-box template and color map to visualize both values (Fig. 4B). We used this method 216 of representation to abstract and quickly summarize the capacity of our assay to detect depleted 217 components, as well as the capacity of the system to remake its individual components. As one 218 example, depletion of Initiation Factor 2 (IF2) did not result in sufficient reduction of expression 219 to warrant further study here (Fig. 4C). As a second example, depletion of HisRS reduced GFP 220 expression to ~0.47 of intact PURE, and subsequent complementation recovered ~0.86 of intact 221 levels (Fig. 4D) to best organize our representations of the capacities of fundamental life-essential functions. We 228 grouped the split-box for each component into higher-order life-essential clusters (e.g., 229 transcription, aminoacylation, recycling of tRNA, ribosomes, energy carriers, and translation 230 factors). Because we choose to study the PURE system here, as have others, as a foundation 231 from which to support ongoing bottom-up synthetic cell building efforts, we named our 232 integrated visual representation the "Pureiodic Table" (Fig. 5). Going forward, as still-more life-233 essential and life-contributing functions are added and evaluated, we imagine that "pure" will 234 refer only to the fact that purified components are being tested within the context of a well-235 defined ensemble of molecules.

237
Practically, from the initial table, we can quickly discern that the blue/blue boxes represent 238 components that have the greatest certainty of functional auto-regeneration, including AlaRS, 239 AergRS, EF-G, EF-Ts, ET-Tu, IF3, CK (creatine kinase), and IP (inorganic pyrophosphatase). 240 Perhaps more importantly, we can also quickly identify components that still need validation 241 (red/empty boxes) as well as components that cannot now be sufficiently regenerated. For 242 example, PURE-made RRF, RF1, and RF2 did not recover PURE-based gene expression in 243 PMD-complemented SCDs. Finally, in addition to presenting the properties at a single-244 translation elongation factor could be removed and fully regenerated as measured by our 247 individual assays but, when all three components were depleted and complemented together, 248 only ~0.60 full PURE activity was recovered, suggesting that redundant activities within this trio 249 mask partial loss of function in the SCD-based assays.

251
Well aware that realizing fully-reproducing synthetic cells is expected to involve integration and 252 testing of hundreds of components, and that others are pursuing and reporting on the regenerative 253 capacities of molecular systems including PURE, we sought to explore integration of reported 254 results across groups in the form of an expanded Pureiodic to remake these 23 enzymes and successfully purified 21 of the 23 attempted. We added each 267 enzyme back to its cognate dropout PURE and measured any recovery in gene expression levels. 268 19 of the 22 enzymes tested recovered some or all of full PURE activity. We developed a 269 standard quantitative visual framework for representing our results, the Pureiodic Table, as a tool  270 for enabling community formation and fellowship among and within synthetic cell building 271 efforts. 272 273 The PURE system is represented to be a minimal transcription and translation system, in the 274 sense that expression should decrease -at least 50% reduction across all enzymes and over 90% 275 reduction for 20 enzymes -if any single component is removed from the system [16]. Instead 276 we found that 13 of the 36 individual dropouts tested here did not result in significant reductions 277 in gene expression. One difference is our use of GFP as a reporter versus DHFR in the original 278 study and any resulting differences in demand for individual amino acids (Table S1).

279
Development and testing of reporters optimized for probing the reproductive capacity of PURE, 280 and of genetic devices that transduce other biochemical activities into gene expression signals, 282 can improve and expand the scope of future testing efforts.

284
We found that three enzymes -RRF, RF1, and RF2 -did not result in recovered PURE activity 285 when tested. As noted, we could not verify production of RRF via SDS PAGE due to its small 286 size even though we were able to detect in the purified sample a 0. Complementing PURE with the needed methylation function is an obvious next step that should 297 also include additional tests that ensure the so-added methylase can be remade via the expanded 298 PURE.

300
We note that while many PURE functions could be individually removed and restored via 301 PURE-made copies, several groupings of functions performed differently when tested together.

302
For example, IF1 or IF2 could be individually removed with no impact on gene expression but 303 could not be removed in combination with IF3. As a second example, all three elongation 304 factors could be individually removed and replaced with near total loss and full recovery of gene 305 expression. However, when all three were simultaneously replaced with PURE-made versions 306 gene expression only recovered to ~60 percent of E. coli-made PURE levels. As a third 307 example, the four energy recycling factors only recovered ~40 percent of E. coli-made PURE 308 levels when replaced all together. The difference between individual and ensemble recovery 309 levels may be due to compounding impacts of partially-functional and complementary 310 components. For example, no individual kinase showed a recovery of less than 79 percent yet the 311 recovery via the PURE-made 'energy recycling' ensemble was only 28 percent. If each kinase 312 was entirely functionally independent of the others we would expect 54 percent recovery across 313 the ensemble (0.85*0.85*0.79*0.94), suggesting that among these four components there may be 314 some functional complementation occurring in the single-component assays.

316
A simpler-to-setup assay that avoids the need to purify PURE-expressed enzymes involves 317 providing the template encoding the component(s) of interest directly into the starting PURE 318 mixture. So long as there is sufficient PURE activity in the starting mixture to kickstart the 319 process, the PURE mixture itself remakes the partially-depleted component. We explored this 320 approach using AlaRS as a test case, first observing GFP expression dynamics in PURE 321 containing various initial concentrations of AlaRS, with and without DNA expressing AlaRS 322 (Fig. S1) transfer technique, whereby the initial concentration of the target components is increasingly 326 minimized. Discrete depletion and complementation assays have the advantage of eliminating 327 any crosstalk from the original non-PURE expressed enzymes and offer a simpler readout. 328 However, fractional depletion and dilution approaches may serve to improve dynamic range for 329 some components and enable for more widespread testing.

331
We hope that the expanded Pureiodic Table (Fig. 6) will serve as a framework for tracking 332 humanity's collective capacity to construct life from scratch. Many additional life-essential 333 functions remain to be added and tested (e.g., replication, metabolism, membrane formation, 334 cytokinesis, etc.). Each life-essential function so-represented serves as a visual icon of what is 335 required to support autonomous reproduction. We also expect that factors improving ensemble 336 performance, including dynamic control and expression fine-tuning, will be required and should 337 be represented. For example, the addition of chaperones and Elongation Factor P (EF-P) to 338 PURE has been shown to increase protein yield and quality [17]. There may also be chemical or 339 physical aspects essential to cell building that are not directly genetically encoded. As the 340 Pureiodic Plasmids and strains 345 All plasmids used in this work will be made freely available as the "Pureiodic Table  346 Construction After another 9-15 hours of growth, we pelleted cell cultures via centrifugation at 5000 x g at 4C 371 for 20 minutes (JA-10 rotor). We responded cell pellets in 25 ml of equilibration buffer in 50 ml 372 falcon tubes and sonicated four times using a microtip (duty cycle 50%, 45 seconds treatment 373 with 2 minutes on ice in between treatments). We centrifuged the resulting solution at 15000 x g 374 at 4C for 60 minutes (JA-10 Fixed-Angle Rotor, Beckman Coulter). We placed Ni-NTA slurry 375 (Thermofisher) in columns with 20 ml of equilibration buffer running through followed by the 376 sample-containing supernatant followed by 20 ml of wash buffer and 5 ml of elution buffer. We 377 then ran the eluted mixture through a FPLC (AKTA PURE) with a salt gradient consisting of 378 mixtures of buffer A (50 mM HEPES) and buffer B (50 mM HEPES plus 1M sodium chloride).

379
We combined protein fractions, exchanged the buffer for storage (Amicon filter), and stored the 380 so-purified proteins at -80C following flash freezing.

382
Protein purification from PURE 383 We added T7-expression vectors encoding strep-tagged proteins (5 nM concentration) in 200 ul 384 of total PURE (NEB) working volume. We pre-washed 500 ul of strep-tactin beads (IBA 385 Lifesciences) three times using 1 mL of wash buffer (Buffer W, IBA Lifescences) before adding 386 the entire PURE reaction volume and incubating at 4C for 3 hours. We immobilized the 387 magnetic beads using a magnet and washed (5x) via pipetting 1 ml of wash buffer. We added 388 elution buffer (IBA Lifesciences) and incubated at 4C for 10 minutes. We then immobilized the 389 bead while taking the supernatant, exchanged buffers for storage (Amicon filter), verified quality 390 via gel separation and mass via a spectrophotometer (Nanodrop). We stored the proteins at -80C 391 following flash freezing. 392

393
Buffer recipes 394 Our equilibration buffer consisted of 50 mM sodium phosphate, 300 mM sodium chloride, and 395 10 mM imidazole. Our wash buffer consisted of 300 mM sodium choloride, and 25 mM 396 amidazole. Our elution buffer consisted of 300 mM sodium chloride and 250 mM imidazole. 397 Our regeneration buffer consisted of 20 mM MES sodium and 100 mM sodium chloride.   with given three letter abbreviations. (a) the number of residues for each amino acid in the 500 reporter gene used here (green fluorescent protein, GFP ). (c) the concentration of each cognate 501 amino acid's tRNA synthetase in the PURE system. (d) the "demand" for a given amino acid 502 normalized to the capacity of PURE to recharge spent tRNA. (r) the "regeneration number" as 503 defined by the number of residues for each amino acid in its cognate tRNA synthetase. (x) the 504 ratio "demand" to "regeneration number." 505 506 507 508