Abstract
Biological records are omnipresent in paleontology, history, and climate science. Tree rings and ice cores provide evidence of environmental conditions that have been recorded in the composition of materials that are deposited over time, carrying with them a record of events that have influenced their existence before being buried underneath ice or inside the trunk of a tree. We constructed a proof of concept synthetic circuit that can be used to create a similar chronological record of events in the DNA of a living E. coli. In our system, phage-based serine integrases are employed to sequentially integrate pieces of DNA corresponding to which stimulus is being detected. We show that placing attB and attP sites close together on a piece of DNA prevents intramolecular reactions, and enables repeated integration events to expand a genetic locus proportionally to integrase induction and abundance of plasmid DNA. We also show that dCas9 binding can prevent integrase from reacting with an attachment site, and in so doing we can control which piece of DNA is integrated by the induction of different guide RNAs. These results represent significant steps towards an event logger that is capable of recording the ordering and magnitude of any number of molecular events. Such a system may be useful in studying complex biological phenomena such as biofilm formation, quorum sensing, or signaling in the gut.
Introduction
Living cells are capable of detecting and responding to sophisticated stimuli present in their environment. Light [1], heat [2], chemicals, and proteins [3] represent a few of the types of stimuli that cells can distinguish. The ability to detect these stimuli is useful to the cell’s survival if the cell can appropriately respond to the stimulus; producing a heat-shock protein in response to heat, for example. A biologist, however, must build sophisticated instruments to measure the same stimuli in order to know the magnitude and chronological order of the events that a cell has lived through. In this work, we describe a system that can create a record of chemical stimuli a cell has seen within that cell’s DNA.
Previous work on DNA-based event detectors has focused on using phage integrases to irreversibly “flip” pieces of DNA in response to stimuli [4]–[6]. Phage integrases are extremely useful proteins to employ in this regard because their action to recombine specific DNA sequences is deterministic, fast [7], and irreversible. However, integrase-based event recorders typically have a limited number of attainable DNA states, meaning that only a few events can be recorded before the memory capacity is ‘used up’. Previous work using cas9 to stochastically excise memory units (conceptually similar to flipping memory units with integrases) in mouse stem cells [8] has shown that a limited number of memory units can be used to record the order and identity of cellular events over many generations. However, lineage information or cellular events can only be recorded until all memory units have been excised or flipped, a fact which places a fundamental limit on such event recording systems.
The CRISPR system represents a natural chronological record [9] of stimuli where pieces of DNA corresponding to phages are inserted into the genome in the order in which the phages were encountered. Phage genomes are chopped into short oligos, which get inserted into the front of the CRISPR array through the action of the Cas1 and Cas2 proteins. In so doing the CRISPR system keep inserting more phage sequences and making the CRISPR array longer, in contrast with more limited integrase-based memory. More recently encountered phages appear closer to the promoter at the front of the CRISPR array, thus those guides are produced in greater abundance than older guides that reside farther down the array. This allows the cell to focus its immune defenses against more pressing threats, while eventually forgetting the faces of long-vanquished foes.
Several groups have endeavored to harness this recording system to encode the presence of electroporated oligos [10] and chemical stimuli [11] in a “DNA tape recorder” type of circuit that takes advantage of random spacer acquisition during overexpression of cas1-2 proteins. In particular, Sheth et al [11] have developed a very interesting circuit that utilizes an inducible copy number plasmid to convert a chemical stimulus into DNA abundance, which is in turn reflected in the identity of “random” DNA spacers acquired in the CRISPR array. Using this system, Sheth et al can identify the presence/absence and chronological order of three different chemicals over a period of four days.
We designed and built a conceptually similar system that can record the chronological order of stimuli in repeatedly integrated DNA elements, using phage integrases instead of cas1-2. Our event logger consists of three components. A set of “data plasmids” serves as a source of DNA for integration, taking advantage of plasmid replication to maintain a pool of un-integrated DNA. A synthetic genetic network serves as the control system which converts stimulus detection into plasmid integration. An engineered genomic integration site allows for simplified extraction and purification of the integrated fragments for sequencing and read-out.
We believe that our system offers several advantages over that developed by Sheth et al. First, phage integrases are much more efficient and their recognition site is more well-defined than that of cas1-2, which allows our system to react faster than cas1-2 while being less toxic, since phage integrases will not interact with the E. coli genome if their cognate attachment site is not present. Second, our system can allow integration of any size of DNA fragment, which can lead to wider applications such as stimulus-directed pathway assembly or programmed integration of promoters and other active genetic elements. We envision these systems being used to produce “molecular sentinels”—bacteria that can be seeded in a river or a waste treatment plant or a gut microbiome to record chemicals or hormones present over time in a much less obtrusive way than using conventional means.
Results
Serine integrases will catalyze a recombination reaction between attP and attB sites, converting these into unreactive attL and attR sites [12]. To allow repeated integration into the same site, a data plasmid must contain both attP and attB sites, to replace the attB site which is destroyed by the recombination. This presents a challenge because intramolecular attachment sites may be recombined with much higher efficiency than intermolecular sites, meaning that data plasmids will be consumed in non-constructive reactions faster than they can be integrated into the genome. We determined that placing parallel attachment sites closer than 100 bp, as measured from the edge of the attB and attP sequences, decreases the rate at which intramolecular recombination occurs at a rate inversely proportional to the distance between the sites (Figure 1). The minimum intramolecular integration rate occurs at 0 bp spacing, resulting in less than 5% of the intramolecular integration activity seen with 100bp spacing.
Next, we constructed a proof of concept event logger system consisting of a single data plasmid and tetracycline inducible Bxb1 integrase. Upon induction, integrase was able to catalyze genome integration and data plasmid multimerization in vivo, proportional to the strength of the inducer pulse seen by the cells (Figure 1). Only one hour of induction with aTc concentations ranging from 1-64 nM generated different amounts of integration, with 32 and 64 nM resulting in nearly complete integration (intermolecular or genomic) of all data plasmids. We sought to control the population of data plasmids by arranging the integrase attachment sites in a translational fusion with chloramphenicol resistance. Thus, data plasmids inserted into the genome or multimerized would not be producing a functional antibiotic resistance gene, and the cell would be forced to maintain a population of un-recombined data plasmids to allow for recording future stimuli. However, multimerized data plasmids were not seen to decrease in number even after additional culturing of the cells for 12 hours following the application of the inducer pulse (data not shown). In addition, entire plasmids were integrated into the genome with this system, resulting in multiple functional Cole1 replication origins being present in the genome following integrase induction. We were unable to isolate cells containing different numbers of genome integrated plasmids, possibly because these high copy origins were resulting in polyploidy.
We also wanted to allow recording of the chronological order and duration of multiple events. To this end, the integrase must select between identical attachment sites to integrate different data plasmids into the genome. We have previously shown that catalytically inactive CRISPR-Cas9 could be used to bind and prevent Bxb1 integrase from binding to specific attachment sites in cell-free extract [13]. Now we have shown that this system works in live E. coli (Figure 2). A plasmid containing two integrase attachment sites can be made to preferentially integrate one or the other, by co-expression of dCas9 and the appropriate guide RNA. This behavior is dependent on dCas9 expression, but a slight leak of pLac-driven guide RNAs results in colonies nominally expressing only dCas9 to appear similar to dCas9 and second (pLac) guide RNAs in this experiment. As a proof of principle that this system is sensitive to induction magnitude, we also tried activating guide RNA and dCas9 production in advance of integrase production, to see if pre-production of guide RNA and dCas9 complexes can affect the magnitude of the attachment site repression effect (Figure 2C). We found that pre-incubation with inducers can increase the magnitude of site selection by about two fold by pre-incubation for 100 min.
We next sought to construct a data plasmid which combined dCas9 attachment site selection and continuous plasmid integration. The main goals of this design were to produce a plasmid which would not result in having high copy plasmid origins integrated into the genome, and which would contain two data plasmids carried on the same piece of DNA. TP901 can be used to excise these minimal data plasmids into minicircles which only contain the Bxb1 attB and attP sites necessary for genome integration. Once the minicircles are excised, the data plasmid must be prevented from replicating, or else the cells would fill up with plasmids that had already excised their minicircles, and no further genome integration could take place.
This is accomplished in two ways. First, the promoter necessary to drive plasmid replication from the Cole1 origin [14] can be placed within one minicircle. This way, when the minicircle is excised, the origin will stop replicating. Second, transcriptional interference from antisense promoters can be used as a way to repress sense promoters [15]. Thus, a second minicircle can be placed after the Cole1 origin, containing a terminator. Once that terminator is excised along with that minicircle, the antisense promoter will also repress plasmid replication (Figure 3A). In this way we can construct a data plasmid that can give rise to two minicircles, either one of which will result in a plasmid that cannot replicate. This will serve to control the population of plasmids within the cell, maintaining only the plasmids which have not had their minicircles excised. Upon induction of minicircle excision, we see that the population of un-excised plasmids is maintained only in the case where minicircle excision affects plasmid replication ability. More experiments must be done, however, to confirm this effect.
Discussion
We have developed a proof of concept system that allows tape recorder-like sequential recording of stimuli in a bacterium’s DNA. The basic idea is to allow a bacterium to choose between a set of data plasmids to integrate, depending on the stimuli that are perceived. Integrase attachment sites B and P present on the data plasmids allow continued integration of these plasmids into a single B site in the genome, and selective expression of guide RNAs from chemical sensitive promoters will result in binding and “repression” of attachment site activity, allowing the system to “choose” a data plasmid to insert from a set of possible varieties. In this report we have described a series of steps approaching a complete system capable of genetic recordings, but we are still working to obtain sequences of these repeatedly integrated genome sites. Our goal, of course, is to be able to start with a genome site sequence, and then determine what the sequence of stimuli had to be to obtain such a sequence.
Sheth et al [11] have developed a similar system, taking advantage of the random spacer integration afforded by cas1-2 protein overexpression to integrate portions of a “trigger plasmid”, whose copy number is varied by selective induction of Rep protein expression by chemical sensitive stimulus. The Sheth et al method shares many similarities to the method proposed here, but we believe that our method is more inherently flexible and capable of doing more. Since we have decided to utilize serine integrases for our event logger, we can take advantage of the fact that they are capable of integrating large pieces of DNA. There are also many different integrases that have been characterized, with different attachment site specificities to choose from [16].
Thus, our system has greater flexibility in terms of what the final genome array sequence will be. One can imagine a system where entire genes are integrated sequentially, producing a complex operon that is defined by the order of stimuli that a cell has seen. Integrases are known to be quite fast and efficient at recombining DNA, which means we could get away with very low integrase expression, while cas1-2 may have to be driven at expression levels that would stress the cell.
Materials and Methods
Cell strains
Cells used were DH5alpha Z1 from Lutz et al [17]. Genome site constructs were made by Gibson assembly into SpeI-KpnI digested pOSIP KH or pOSIP KO from Pierre et al [18], followed by genome integration and pE-FLP excision protocol as described.
Constructs
Bxb1 integrase sequence was amplified from the Dual-recombinase-controller vector, which was a gift from Drew Endy (Addgene plasmid # 44456) [6]. dCas9 was amplified from pAN-PTet-dCas9, which was a gift from Christopher Voigt (Addgene plasmid # 62244) [19]. Guide RNA sequences were G1: GTTGACcagacaaacccatt, G2: GTTGACcagacaaacctagt, G3: GTTGACcagacaaaccaatg, sgRNA scaffold sequence:
GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTG C. Bxb1 attB: GGCTTGTCGACGACGGCGTGCTCCGTCGTCAGGATCAT, Bxb1 attP: GGTTTGTCTGGTCAACCACCGCGTGCTCAGTGGTGTACGGTACAAACC, TP901 attP (mc1): GCGAGTTTTTATTTCGTTTATTCAAATTAAGGTAACTAAAAAACTCCTTT, TP901 attB (mc1): AACACAATTAACATCCAAATCAAGGTAAATGCTTT, TP901 attB(mc2): AACACAATTAACATCTCAATCAAGGTAAATGCTTT, TP901 attP(mc2):
GCGAGTTTTTATTTCGTTTATTTCAATTAAGGTAACTAAAAAACTCCTTT. Cole1 origin with no promoter was made by PCR with PCOLE1NOP001_F: AAAGGTCTCAGCTTGCAAACAAAAAAACCACCGCTACC, and PCOLE1NOP002_R: AAAGGTCTCACTCCGCCAGGAACCGTAAAAAGGCC, Promoter in Minicircle 1 was: tttacggctagctcagtcctaggtatagtgctagc, Promoter in front of Minicircle 2 was: ttgacagctagctcagtcctaggtataatactag.
Experiments
Cells were grown to 0.2 OD then inducers were added. 1-64 nM Anhydrotetracycline (Sigma), 0.2 uM Sodium Salicylate (Sigma), 0.2% Arabinose (Teknova), 1 mM Isopropyl-beta-D-thiogalactoside (Sigma).
Cells were subsequently grown for varying amounts of time. Then, 10 uL of cells were transferred into another 1 mL of culture and cells were grown for 12 hours again.