Promoter-sequence determinants and structural basis of primer-dependent transcription initiation in Escherichia coli

Chemical modifications of RNA 5′ ends enable “epitranscriptomic” regulation, influencing multiple aspects of RNA fate. In transcription initiation, a large inventory of substrates compete with nucleoside triphosphates (NTPs) for use as initiating entities, providing an ab initio mechanism for altering the RNA 5′ end. In Escherichia coli cells, RNAs with a 5′-end hydroxyl are generated by use of dinucleotide RNAs as primers for transcription initiation, “primer-dependent initiation.” Here we use massively systematic transcript end readout (“MASTER”) to detect and quantify RNA 5′ ends generated by primer-dependent initiation for ~410 (~1,000,000) promoter sequences in E. coli. The results show primer-dependent initiation in E. coli involves any of the 16 possible dinucleotide primers and depends on promoter sequences in, upstream, and downstream of the primer binding site. The results yield a consensus sequence for primer-dependent initiation, YTSS-2NTSS-1NTSSWTSS+1, where TSS is the transcription start site, NTSS-1NTSS is the primer binding site, Y is pyrimidine, and W is A or T. Biochemical and structure-determination studies show that the base pair (nontemplate-strand base:template-strand base) immediately upstream of the primer binding site (Y:RTSS-2, where R is purine) exerts its effect through the base on the DNA template strand (RTSS-2) through inter-chain base stacking with the RNA primer. Results from analysis of a large set of natural, chromosomally-encoded E. coli promoters support the conclusions from MASTER. Our findings provide a mechanistic and structural description of how TSS-region sequence hard-codes not only the TSS position, but also the potential for epitranscriptomic regulation through primer-dependent transcription initiation.


Introduction
In transcription initiation, the RNA polymerase (RNAP) holoenzyme binds promoter DNA by making sequence-specific interactions with core promoter elements and unwinds a turn of promoter DNA forming an RNAP-promoter open complex (RPo) containing a single-stranded "transcription bubble." Next, RNAP selects a transcription start site (TSS) by placing the start-site nucleotide and the next nucleotide of the "template DNA strand" into the RNAP active-center product site ("P site") and addition site ("A site"), respectively, and binding an initiating entity in the RNAP active-center P site ( Figure 1A).
RNAPs can initiate transcription using either a primer-independent or primer-dependent mechanism (1-10). In primer-independent initiation, the initiating entity (typically a nucleoside triphosphate, NTP) base pairs to the template-strand nucleotide in the RNAP active-center P site (TSS; Figure 1A). In primer-dependent transcription initiation, the 3ʹ nucleotide of a 2-, 3-, or 4-nucleotide RNA primer (di-, tri-, or tetranucleotide primer, respectively) base pairs to the template-strand nucleotide in the RNAP active-center P site, and the 5ʹ nucleotide of the primer base pairs to the template-strand nucleotide in the P-1, P-2, or P-3 site (TSS-1, TSS-2, and TSS-3, respectively; Figure 1A).
In Escherichia coli cells, primer-dependent transcription initiation occurs during stationary-phase growth and modulates the expression of genes involved in biofilm formation (9)(10)(11). RNAs generated by primer-dependent initiation in E. coli contain a 5ʹ-end hydroxyl (5ʹ-OH), indicating that the primers incorporated at the RNA 5ʹ end also contain a 5ʹ-OH (9). Available data suggests that most primer-dependent initiation in E. coli involves use of dinucleotide primers, most frequently UpA and GpG (9)(10)(11). However, direct evidence that dinucleotides serve as the predominant initiating entity in primer-dependent initiation has not been presented. In addition, apart from the sequences complementary to the primer, "the primer binding site," promoter-sequence determinants for primer-dependent initiation have not been defined.
Here we adapt a massively parallel reporter assay to monitor primer-dependent initiation in E.
coli. The results provide a complete inventory of the RNA 5ʹ-end sequences generated by primer-dependent initiation in E. coli and define the critical promoter-sequence determinants for primer-dependent initiation. The results demonstrate that most, if not all, primer-dependent initiation in E.
coli involves use of a dinucleotide as the initiating entity and identify a consensus sequence for primer-dependent initiation, YTSS-2NTSS-1NTSSWTSS+1, where TSS is the transcription start site, NTSS-1NTSS is the primer binding site, Y is pyrimidine, and W is A or T. We further demonstrate that sequence information at the position immediately upstream of the primer binding site resides exclusively in the template strand of the transcription bubble (RTSS-2, where R is purine). We report crystal structures of transcription initiation complexes containing dinucleotide primers that reveal the structural basis for a purine at the template-strand position immediately upstream of the primer binding site (RTSS-2): namely, more extensive, and likely more energetically favorable, base stacking between the template-strand base and the primer 5ʹ base.

Use of massively systematic transcript end readout, "MASTER," to monitor primer-dependent initiation in E. coli
To define, comprehensively, the promoter-sequence determinants for primer-dependent initiation in E. coli, we modified a massively parallel reporter assay previously developed in our lab termed massively systematic transcript end readout, "MASTER" (12,13), in order to detect both primer-independent and primer-dependent initiation, to differentiate between primer-independent and primer-dependent initiation, and to define primer lengths in primer-dependent initiation ( Figure 1B).
MASTER involves construction of a promoter library that contains up to 4 11 (∼4,000,000) barcoded sequences, production of RNA transcripts from the promoter library, and analysis of RNA barcodes and RNA 5′ ends using high-throughput sequencing (5′ RNA-seq) to define, for each RNA product, the template that produced the RNA and the sequence of the RNA 5′ end ( Figure 1B) (12)(13)(14).
The 5′ RNA-seq procedure used in MASTER relies on ligation of single-stranded oligonucleotide adaptors to RNAs containing a 5′-end monophosphate (5′-p) (13). In previous work, we treated RNAs with RNA 5´ Pyrophosphohydrolase (Rpp), which converts a 5′-end triphosphate (5ʹ-ppp) to a 5′-p; this procedure specifically enables detection of the 5ʹ-ppp bearing RNAs generated by primer-independent initiation (12,(14)(15)(16). Here, we treated RNAs, in parallel, with Rpp to detect RNAs generated by primer-independent initiation and with Polynucleotide Kinase (PNK), which converts a 5ʹ-OH to a 5′-p, to detect RNAs generated by primer-dependent initiation ( Figure 1B). By comparing the results from Rpp and PNK reactions we quantify, for each promoter sequence in the library, the relative efficiencies of primer-independent and primer-dependent initiation, and primer lengths for primer-dependent initiation.
In the present work, we used a MASTER template library containing 4 10 (~1,000,000) sequence variants at the positions 1-10 base pairs (bp) downstream of the -10 element of a consensus s 70 -dependent promoter (placCONS-N10; Figure 1B). The randomized segment of placCONS-N10 contains the full range of TSS positions for E. coli RNAP observed in previous work (i.e., TSS positions located 6, 7, 8, 9, and 10 bp downstream of the promoter -10 element; Figure 2A). We introduced the placCONS-N10 library into E. coli, grew cells to stationary phase (the phase in which primer-dependent initiation has been observed in previous work; 9), isolated total cellular RNA, and analyzed RNAs generated from each promoter sequence in the library by 5′ RNA-seq. The results provide complete inventories of RNA 5ʹ ends generated by primer-independent initiation and primer-dependent initiation in stationary-phase E. coli cells.

Primer-dependent initiation: 5′-end positions
Our results define distributions of 5′-end positions of the 5′-ppp RNAs generated by primer-independent initiation and the 5′-OH RNAs generated by primer-dependent initiation for transcription in stationary-phase E. coli cells ( Figure 2B).
The distributions of 5′-end positions for primer-independent initiation show 5′-end positions (TSS positions) ranging from 6-10 bp downstream of the promoter -10 element, with a mean 5′-end position ~7.5 bp downstream of the promoter -10 element ( Figure 2B, top). The range, the mean, and the distribution shape closely match those previously observed for primer-independent initiation for cells in exponential phase (12).
The distributions of 5′-end positions for primer-dependent initiation show 5′-end positions ranging from 5-9 bp downstream of the promoter -10 element, with a mean 5′-end position ~6.8 bp downstream of the promoter -10 element ( Figure 2B, bottom). The range, the mean, and the distribution shape closely match those for primer-independent initiation, but with a ~1 bp upstream shift.

Primer-dependent initiation: primer lengths
Comparison of the 5′-end distributions for primer-independent initiation ( Figure 2B, top) vs.
primer-dependent initiation ( Figure 2B, bottom) indicates that, across all promoter sequences in the library, the 5′-end positions of RNAs generated by primer-independent initiation (mean position 7.54 ± 0.01 bp downstream of -10 element) and RNAs generated by primer-dependent initiation (mean position 6.75 ± 0.05 bp downstream of -10 element) differ by ~1 bp (0.71 ± 0.06 bp). Following the logic of Figure 1, based on the observed difference of almost exactly 1 bp in mean 5′-end position for primer-independent initiation vs. primer-dependent initiation, we infer that primer length in primer-dependent initiation in stationary-phase E. coli cells is almost always 2 nt. Computational modeling, using the distributions in Figure 2B, indicates that no more than ~2.5% of the observed primer-dependent initiation could involve primer lengths greater than 2-nt ( Figure S1A). Consistent with these inferences, comparison of distributions of RNA 5′-end positions for primer-independent initiation in vitro vs. primer-dependent initiation in vitro with the dinucleotide primer UpA shows essentially the same ~1 bp upstream shift in distribution range, mode, and mean ( Figures 2C, S1B).

Primer-dependent initiation: primer sequences
We next measured yields of 5ʹ-OH RNAs generated by primer-dependent initiation with each of the 16 possible dinucleotide primers ( Figure 3A). The results show that primer-dependent initiation occurs with all 16 dinucleotide primers. Highest levels of primer-dependent initiation are observed with the dinucleotide primers UpA and GpG, which account for ~27% and ~17%, respectively, of 5ʹ-OH RNAs generated across all promoters in the library ( Figure 3A, left). The other 14 dinucleotide primers each account for ~1% to ~8% of 5ʹ-OH RNAs generated across all promoters in the library. Qualitatively similar results are obtained analyzing RNA products from promoters where the primer binding site is at positions 5-6, 6-7, 7-8, 8-9, and 9-10 bp downstream of the promoter -10 element ( Figure 3A, right). The demonstration that primer-dependent initiation occurs with all 16 dinucleotides in vivo is new to this work, as is the demonstration that primer-dependent initiation occurs at the full range of TSS positions observed for primer-independent initiation in vivo (i.e., TSS positions located 6,7,8,9, and 10 bp downstream of the promoter -10 element). The observation that UpA and GpG are preferentially used as primers in vivo is consistent with results of prior work (9,10).

Primer-dependent initiation: promoter-sequence dependence, primer binding site
Analysis of the results for primer-dependent initiation, separately considering RNA products with 5ʹ ends complementary to the template and RNA products with 5ʹ ends non-complementary to the template, shows that the overwhelming majority of primer-dependent initiation in stationary-phase E. coli cells occurs at primer binding sites that have perfect template-strand complementarity to the 5ʹ and 3ʹ nucleotides of the dinucleotide primer (93.3 ± 0.4%; Figure 3B, bottom). This is true, across the entire promoter library, for each of the 16 possible dinucleotide primer sequences (73.9 ± 0.2% to 98.1 ± 0.01% of primer binding sites with perfect complementarity; Figure 3B, bottom), and for each of the major primer binding-site positions ( Figure S2). Most of the limited non-complementarity observed involves the 5ʹ nucleotide of the dinucleotide primers CpG, UpG, CpU and UpU (24.2 ± 0.4%, 21.1 ± 0.6%, 10.3 ± 0.6%, and 10.1 ± 0.9%, respectively; Figures 3B, S2). Consistent with these results, analysis of the same promoter library in vitro, assessing primer-dependent initiation with the dinucleotide primer UpA, shows that the overwhelming majority of primer-dependent initiation likewise occurs at primer binding sites that have perfect template-strand complementarity to the 5ʹ and 3ʹ nucleotides of the dinucleotide primer for each of the major primer binding-site positions (84.1 ± 0.6%; Figures 3B, bottom right, S3). In vitro transcription experiments using heteroduplex templates (templates having non-complementary transcription-bubble nontemplate-strand and template-strand sequences) and the dinucleotide primer UpA show that the strong preference for perfect template-strand complementarity to the 5ʹ and 3ʹ nucleotides of the dinucleotide primer reflects a requirement for Watson-Crick base pairing of template-strand nucleotides at positions TSS-1 and TSS with the 5ʹ and 3ʹ nucleotides of the dinucleotide RNA primer, respectively ( Figure S4).
We conclude that primer-dependent initiation with a dinucleotide primer almost always involves a primer binding site having perfect template-strand complementarity to, and therefore able to engage in Watson-Crick base pairing with, the dinucleotide primer. This result is not completely unexpected.
However, this point has not been demonstrated previously in vivo, and prior work in vitro, with tetranucleotide primers (17), had indicated that perfect template-strand complementarity to the primer may not be necessary for primer-dependent initiation with longer primers.

Primer-dependent initiation: promoter-sequence dependence, sequences flanking the primer binding site
The observed yields of 5ʹ-OH RNA products from primer-dependent initiation in stationary-phase In vitro transcription experiments assessing competition between primer-dependent transcription initiation with UpA and primer-independent transcription initiation with ATP show that primer-dependent initiation is ~60 times more efficient than primer-independent initiation at a promoter conforming to the consensus sequence (TTSS-2TTSS-1ATSSTTSS+1; Figure 5A), but is only ~10 times more efficient at a promoter not having the consensus sequence (GTSS-2TTSS-1ATSSTTSS+1; Figure 5A).
In vitro transcription experiments using heteroduplex templates and the dinucleotide primer UpA show that the sequence information responsible for the preference for Y:R at TSS-2 resides exclusively in the DNA template strand ( Figure 5B). Thus, in experiments with heteroduplex templates, primer-dependent initiation is reduced by replacement of the consensus nucleotide by a non-consensus nucleotide or an abasic site on the DNA template strand, but is not reduced by replacement of the consensus nucleotide by a non-consensus nucleotide or an abasic site on the DNA nontemplate strand ( Figure 5B).
We conclude that primer-dependent initiation, in vivo and in vitro, depends not only on the sequence of the primer binding site, but also on flanking sequences, with the preferred sequence being YTSS-2NTSS-1NTSSWTSS+1 (Y:RTSS-2N:NTSS-1N:NTSSW:WTSS+1).

Primer-dependent initiation: chromosomal promoters
To assess whether the sequence preferences observed in the MASTER analysis apply also to natural promoters, we quantified primer-dependent initiation in stationary-phase E. coli cells at each of 93 promoters that use UpA as primer (Table S1, Figure 6A). The results show the same sequence preferences at positions TSS-2 and TSS+1 observed in the MASTER analysis are observed in chromosomally-encoded promoters ( Figure 6A).
To assess directly the functional significance of the sequence preferences observed in the MASTER analysis and natural promoter analysis, we constructed mutations at positions TSS-2 and TSS+1 of a natural promoter that uses UpA as primer ( Figure 6B, top) and assessed effects on function in stationary-phase E. coli cells ( Figure 6B, bottom). We observed that, at position TSS-2, the consensus base pair T:A is preferred over the non-consensus base pair G:C by a factor of ~5 ( Figure 6B, bottom), and, at position TSS+1, the consensus base pair T:A is preferred over the non-consensus base pair G:C by a factor of ~2.5 ( Figure 6B, bottom). We conclude that the sequence dependence for primer-dependent initiation defined using MASTER is also observed in natural, chromosomally-encoded E. coli promoters.
To determine the structural basis of the preference for a template-strand purine nucleotide at position TSS-2 (RTSS-2) in primer-dependent initiation, we determined crystal structures of transcription initiation complexes containing a template-strand purine nucleotide at position TSS-2 ( Figure 7). We first prepared crystals of Thermus thermophilus RPo using synthetic nucleotide scaffolds containing a template-strand purine nucleotide, A, at position TSS-2, and containing a template-strand primer binding site for either the dinucleotide primer used most frequently in primer-dependent initiation in vivo, UpA (9, 10; Figure 3A), or the dinucleotide primer used second most frequently in primer-dependent initiation in vivo, GpG (9, 10; Figure 3A). We next soaked the crystals either with UpA and CMPcPP or with GpG and CMPcPP, to yield crystals of T. thermophilus RPo in complex with a dinucleotide primer and a non-reactive analog of an extending NTP. We then collected X-ray diffraction data, solved structures, and Our structural results show that the sequence preference for purine at template-strand position TSS-2 is a consequence of inter-chain base stacking of a purine at template-strand position TSS-2 with the 5ʹ nucleotide of a dinucleotide primer (Figures 7, S7). The structural basis of the preference for purine vs. pyrimidine at template-strand position TSS-2 in primer-dependent initiation is analogous to--almost identical to--the previously described structural basis of the preference for purine vs. pyrimidine at template-strand position TSS-1 in primer-independent initiation (19,20). In the former case, inter-chain base stacking between the primer 5ʹ base in the RNAP active-center P-1 site and a purine at template-strand position TSS-2 facilitates binding of the primer and an extending NTP. In the latter case, inter-chain base-stacking between the initiating NTP in the RNAP active-center P site and a purine at template-strand position TSS-1 facilitates binding of the initiating NTP and an extending NTP.

Promoter sequences upstream of the TSS modulate the chemical nature of the RNA 5ʹ end
Chemical modifications of the RNA 5ʹ end provide a layer of "epitranscriptomic" regulation, influencing multiple aspects of RNA fate, including stability, processing, localization, and translation efficiency (21)(22)(23)(24). Primer-dependent initiation provides one mechanism to alter the RNA 5ʹ end during transcription initiation. In primer-dependent initiation with a dinucleotide primer, the RNA product acquires a 5ʹ hydroxyl and acquires one additional nucleotide at the RNA 5ʹ end (Figure 1). In an analogous manner, NCIN-dependent initiation--where an NCIN is a non-canonical initiating nucleotide--provides another mechanism to alter the RNA 5ʹ end during transcription initiation (25,26).

Materials and Methods
Proteins E. coli RNAP core enzyme used in transcription experiments was prepared from E. coli strain NiCo21(DE3) (New England Biolabs, NEB) transformed with plasmid pIA900 (31) using culture and induction procedures, immobilized-metal-ion affinity chromatography on Ni-NTA agarose, and affinity chromatography on Heparin HP as described in (32). E. coli s 70 was prepared from E. coli strain NiCo21 (DE3) transformed with plasmid ps 70 -His using culture and induction procedures, immobilized-metal-ion affinity chromatography on Ni-NTA agarose, and anion-exchange chromatography on Mono Q as described in (33). 10x RNAP holoenzyme was formed by mixing 0.5 µM RNAP core and 2.5 µM s 70 in 1x reaction buffer (40 mM Tris HCl, pH 7.5; 10 mM MgCl 2 ; 150 mM KCl; 0.01% Triton X-100; and 1 mM DTT).

Oligonucleotides
Oligodeoxyribonucleotides (Table S3) were purchased from Integrated DNA Technologies (IDT) and were purified with standard desalting purification. UpA and GpG were purchased from Trilink Biotechnologies. NTPs (ATP, GTP, CTP, and UTP) were purchased from GE Healthcare Life Sciences.
Homoduplex and heteroduplex templates used in single-template in vitro transcription assays were generated by mixing 1.1 µM nontemplate-strand oligo with 1 µM template-strand oligo in 10 mM Tris (pH 8.0). Mixtures were heated to 90°C for 10 min and slowly cooled to 40°C (0.1°C / second) using a Dyad PCR machine (Bio-Rad).

Primer-dependent initiation in vitro: transcription reaction conditions
A linear DNA fragment containing placCONS-N10 generated as described in (34)  The gel was stained with SYBR Gold nucleic acid gel stain (Invitrogen), bands visualized on a UV transilluminator, and RNA products ~150 nt in length were excised from the gel. The excised gel slice was crushed, 300 μl of 0.3 M NaCl in 1x TE buffer was added, and the mixture was incubated at 70°C for 10 min. Eluted RNAs were collected using a Spin-X column (Corning). After the first elution, the crushed gel fragments were collected and the elution procedure was repeated, nucleic acids were collected, pooled with the first elution, isolated by isopropanol precipitation, and resuspended in 25.5 μl of RNase-free water (Invitrogen). Reactions were performed in triplicate.

Primer-dependent initiation in stationary-phase E. coli cells: cell growth
Three independent 25 ml cell cultures of E. coli MG1655 cells (gift of A. Hochschild, Harvard Medical School) containing placCONS-N10 and pPSV38 were grown in LB media (Millipore) containing chloramphenicol (25 μg/ml), gentamicin (10 μg/ml), and IPTG (1 mM) in a 125 ml DeLong flask (Bellco Glass) shaken at 220 RPM at 37°C until late stationary phase (~21 hours after entry into stationary phase; final OD600 ~3.5). 2 ml aliquots of cell suspensions were placed in 2 ml tubes and cells were collected by centrifugation (1 min; 21,000 x g; 20°C). Supernatants were removed and cells stored at -80°C.

Primer-dependent initiation in stationary-phase E. coli cells: RNA isolation
RNA was isolated from frozen cell pellets as described in (12). Cell pellets were resuspended in 600 µl of TRI Reagent solution (Molecular Research Center), incubated at 70°C for 10 min, and centrifuged (10 min; 21,000 x g; 4°C) to remove insoluble material. The supernatant was transferred to a fresh tube, ethanol was added to a final concentration of 60.5%, and the mixture was applied to a Direct-zol spin column (Zymo Research). DNase I (Zymo Research) treatment was performed on-column according to the manufacturer's recommendations. RNA was eluted from the column using nuclease-free water heated to 70°C (3 x 30 µl elutions; total volume of eluate = 90 µl). RNA was treated with 2 U TURBO DNase (Invitrogen) at 37°C for 1 h, samples were extracted with acid phenol:chloroform (Ambion), RNA was recovered by ethanol precipitation and resuspended in RNase-free water. A MICROBExpress Kit (Invitrogen) was used to remove rRNAs from ~36 µg of recovered RNA, rRNA-depleted RNA was isolated by ethanol precipitation and resuspended in 40 µl of RNase-free water.

Enzymatic treatment of RNA products
For RNAs isolated from E. coli, 3 µg of rRNA-depleted RNA was used in each reaction. RNAs Rpp and PNK treatment: RNA products were mixed with 20 U PNK, 40 U RNaseOUT, and 1 mM ATP in 1x PNK reaction buffer (total reaction volume = 50 µl) and incubated at 37°C for 1 hr.
Processed RNAs were recovered using Qiagen's RNeasy MinElute kit (following the manufacturer's recommendations with the exception that RNAs were eluted from the column using 25 µL nuclease-free water heated to 70°C). Recovered RNA products were mixed with 20 U Rpp and 40 U RNaseOUT in 1x Rpp reaction buffer (total reaction volume = 30 µl) and incubated at 37°C for 1 hr. Reactions were extracted with acid phenol:chloroform, RNA was recovered by ethanol precipitation, and resuspended in 10.5 μl RNase-free water.
"mock" PNK treatment (total reaction volume = 50 µl): RNA products were mixed with 40 U RNaseOUT and 1 mM ATP in 1x PNK reaction buffer and incubated at 37°C for 1 hr. Reactions were extracted with acid phenol:chloroform, RNA was recovered by ethanol precipitation, and resuspended in 10.5 μl RNase-free water.
Reactions were extracted with acid phenol:chloroform, RNA was recovered by ethanol precipitation, and resuspended in 10.5 μl RNase-free water.

5'-adaptor ligation
To enable quantitative comparisons between samples, we performed the 5'-adaptor ligation step using barcoded 5'-adaptor oligonucleotides as described in (16). For RNA products isolated from stationary-phase E. coli cells, oligo i105 was used for RNAs processed by Rpp, oligo i106 was used for RNAs processed with PNK, oligo i107 was used for RNAs processed with both Rpp and PNK, and oligo i108 was used for unprocessed RNAs (mock PNK treated). For RNA products isolated from in vitro reactions, oligo i105 was used for RNAs processed by Rpp, oligo i106 was used for unprocessed RNAs (mock Rpp treated); oligo i107 was used for RNAs processed by PNK, and oligo i108 was used for unprocessed RNAs (mock PNK treated).  10% 7M urea slab gels (equilibrated and run in 1x TBE). Gels were incubated with SYBR Gold nucleic acid gel stain, and bands were visualized with UV transillumination. For RNAs isolated from stationary-phase E. coli cells, products migrating above the 5'-adapter oligo were isolated from the gel (procedures as above; recovered in 50 μl of nuclease-free water). For RNAs generated in vitro, products ~150 nt in length were recovered from the gel (procedures as above; recovered in 16 μl of nuclease-free water). 5'-adaptor-ligated RNAs were used for analysis of primer-dependent initiation from placCONS-10 (this section) and for analysis of primer-dependent initiation from natural, chromosomally-encoded promoters (next section).

First strand cDNA synthesis
For RNAs isolated from stationary-phase E. coli cells, 25 μl of 5'-adaptor-ligated RNAs were mixed with 1.5 μl s128A oligonucleotide (3 μM) and 3.5 μl nuclease-free water. The 30 μl mixture was incubated at 65°C for 5 min, cooled to 4°C, and combined with 20 μl of a solution containing 10 μl of 5x Nucleic acids were separated by electrophoresis on 10% 7M urea slab gels (equilibrated and run in 1x TBE). Gels were incubated with SYBR Gold nucleic acid gel stain, bands were visualized with UV transillumination, and species ~80 to ~150 nt in length were recovered from the gel (procedure as above) and recovered in 20 μl of nuclease-free water. High-throughput sequencing.
Barcoded libraries were pooled and sequenced on an Illumina NextSeq platform in high-output mode using custom sequencing primer s1115.

Sample serial numbers
Samples KS112-KS114 are cDNA derived from RNA products generated in stationary-phase E.

Data analysis: separation of RNA 5'-end sequences by enzymatic treatment, promoter sequence, and
promoter position RNA 5'-end sequences were associated with an enzymatic treatment using the 4-nt barcode sequence acquired upon ligation of the 5'-adaptor (see above) as described in (16). RNA 5'-end sequences were associated with a placCONS promoter sequence using transcribed-region barcode assignments derived from the analysis of sample Vv945 described in (14). RNA 5'-end sequences that could be aligned to their template of origin with no mismatches were used for results presented in Figures 2, 3A, 4, S1, S5, and S6. RNA 5'-end sequences with mismatches at the first and/or second base of the 5'-end were also included for results shown in Figures 3B, S2-S3.

Data analysis: 5'-end distribution histograms (Figures 2, S1B, S8)
The number of 5'-end sequences emanating from each position 4 to 10 bp downstream of the -10 element of placCONS was determined for each of the ~4 10 (~1,000,000) promoter sequences. These counts are represented using four vectors, ⃗ !"#$%&& , ⃗ %&& , ⃗ !"# , and ⃗ '()* , which represent the number of counts observed for 5'-ends at positions = 4, 5, … , 10 for each enzymatic treatment. We initially estimated the number of 5'-ppp RNAs and 5'-OH RNAs in two different ways: We initially computed the number of 5'-ppp RNAs and 5'-OH RNAs in two different ways: These four read count distributions were computed separately for each of the three replicates. To visualize these distributions, we normalized each counts vector by its sum across all positions, i.e., ∈ {ppp1, ppp2, OH1, OH2}.
The 5'-OH distributions that resulted exhibited two obvious problems ( Figure S8A, left): there was substantial variation across replicates, and four of the distributions exhibited probabilities well below zero at two positions ( = 8,9). We reasoned that these defects might be artefacts resulting from the enzymatic treatments not being 100% efficient, and that accounting for these inefficiencies might lead to more accurate 5'-OH distributions. Let !"#$%&& , !"# , %&& , and '()* denote the efficiencies of the four enzymatic treatments. Then the number of true underlying counts in the four samples becomes, One can solve for the unknown efficiencies by setting ⃗ &&&+ = ⃗ &&&, , or equivalently ⃗ -.+ = ⃗ -., , both of which give This provides a system of 7 equations with 4 unknowns, and setting '()* = 1 allowed us to solve for the other three other (now relative) efficiencies. We did this by minimizing the objective function are vectors that respectively represent the residuals and Poisson-estimated variances ( Figure S8B). Using these efficiencies, we computed the corrected read count distributions ( Figure S8A, right). The resulting 5'-OH distributions were much more reproducible across replicates. Moreover, a negative probability was estimated only for position 9 for ⃗ -.,, of replicate 2, and even this was much closer to zero than in the uncorrected profiles ( Figure S8A, right). We therefore chose to compute 5'-ppp and 5'-OH read counts using the averages, These averaged distributions were used in the in vivo MASTER analysis shown in Figures 2 and S1A, as were analogous formulas for analyzing primer usage and for generating the sequence logos shown in To carry out the mixture modeling of 5'-end distributions, we defined a 7x9 matrix , whose entries 10 represent the fraction of reads with 5'-ends at positions 4-10 (corresponding to = 1, 2, … , 7 within the 5'-ppp distribution shifted by -4, -3, ..., +4 nt (corresponding to = 1, 2, … , 9). Note that these shifted distributions were normalized so that ∑ 10 1 = 1 for every column . We further defined a 7x1 vector H⃗ representing the 5'-OH distribution, normalized so that ∑ 1 1 = 1. We then inferred a 9x1 vector of mixture coefficients ⃗ by solving the constrained least squares problem ⃗ * = argmin P ⃗ − H⃗ P , under the constraints that all 0 ≥ 0 and that ∑ 0 0 = 1. The resulting mixture distribution H⃗ * = ⃗ * is shown in Figure S1A (left panel), alongside the 5'-OH distribution H⃗ . The corresponding mixture coefficients are shown in Figure S1A (right panel). The residual deviation was computed as

Data analysis: in vivo sequence logos (Figures 4, S5)
Sequence logos illustrate the sequence-dependent log2 likelihood of primer-dependent initiation for primer binding sites 6-7, 7-8, and 8-9. All logos were created using Logomaker (35). The specific quantities illustrated in the logos were computed as follows.
To generate the logos shown in Figures 4 (left)  These estimated counts were used to generate logos for promoter sequences with a UpA binding site at positions and + 1 using the procedure described above.

cDNA library construction and sequencing
Cell growth, RNA isolation, enzymatic treatments, and 5'-adaptor ligations were performed as reactions were incubated at 95°C for 5 min, cooled to 10°C, 4.5 μl of 1.2M HCl was added, followed by 60 μl of 2x RNA loading dye. Nucleic acids were separated by electrophoresis on 10% 7M urea slab gels (equilibrated and run in 1x TBE). Gels were incubated with SYBR Gold nucleic acid gel stain, bands were visualized with UV transillumination, and species ~80 to ~150 nt in length were recovered from the gel (procedure as above) and recovered in 20 μl of nuclease-free water.
cDNA amplification and high-throughput sequencing was performed as described above. Serial numbers for these samples are KS118-KS120.
Data analysis: chromosomal promoter sequence logo ( Figure 6A) Sequencing reads were associated with one of the four reaction conditions based on the identity of the 4-nt barcode sequence. RNA 5'-end sequences that could be aligned to the chromosomally-encoded promoter from which they were expressed with no mismatches were used for results presented in Figure   6A.

RNA isolation
Cells were resuspended in 0.6 ml of TRI-Reagent, incubated at 70°C for 10 min, and the cell lysate was centrifuged to remove insoluble material (10 min; 21,000 x g; 4°C). The supernatant was transferred to a fresh tube, ethanol was added to a final concentration of 60.5%, and the mixture was applied to a Direct-zol spin column. DNase I treatment was performed on-column according to the manufacturer's recommendations. RNA was eluted from the column using nuclease-free water that had been heated to 70°C (3 x 30 µl elutions; total volume of eluate = 90 µl). RNA was treated with 2 U TURBO DNase at 37°C for 1 h to remove residual DNA. Samples were extracted with acid phenol:chloroform, RNA was recovered by ethanol precipitation and resuspended in RNase-free water.

Primer extension analysis
Assays were performed essentially as described in (10). 10 µg of RNA was combined with 3 µM of primer k711 (5'-radiolabeled using PNK and [γ 32 P]-ATP). The RNA-primer mixture was heated to 95°C for 10 min, slowly cooled to 40°C (0.1°C/s), incubated at 40°C for 10 min, and cooled to 4°C using a thermal cycler (Biorad). Next, 10 U of AMV reverse transcriptase (NEB) was added, reactions were incubated at 55°C for 60 min, heated to 90°C for 10 min, cooled to 4°C for 30 min, and mixed with 10 μl of 2x RNA loading buffer (95% formamide; 0.5 mM EDTA, pH 8; 0.025% SDS; 0.0025% bromophenol blue; and 0.0025% xylene cyanol). Nucleic acids were separated by electrophoresis on 8%, 7M urea slab gels (equilibrated and run in 1x TBE) and radiolabeled products were visualized by storage phosphor imaging. Band assignments were made by comparison to a DNA sequence ladder prepared using primer k711 and pBEN516 as template (Affymetrix Sequenase DNA sequencing kit, version 2).

Structure determination (Figures 7, S7, Table S2)
The nucleic-acid scaffold for assembly of Thermus thermophilus RPo was prepared from synthetic oligonucleotides (Sangon Biotech) by an annealing procedure (95°C, 5 min followed by For each structure, T. thermophilus RPo was reconstituted by mixing T. thermophilus RNAP holoenzyme purified as described in (18)

Figure 2. Distributions of 5′-end sequences for RNAs generated in primer-independent initiation and primer-dependent initiation in vivo and in vitro.
A. lacCONS-N10 library. Base pairs in the N10 region are numbered based on their position relative to the promoter -10 element. Colors as in Figure 1B.  Top: primer-dependent initiation involving template-strand complementarity to both 5ʹ and 3ʹ nucleotides of primer (TSS-1, TSS), template-strand complementarity to only 3ʹ nucleotide of primer (TSS), template-strand complementarity to only 5ʹ nucleotide of primer (TSS-1), or no template-strand complementarity to primer (none). Three vertical lines, complementarity; X, non-complementarity. Other symbols and colors as in Figure 1. Bottom: percentage of primer-dependent initiation involving complementarity to both 5ʹ and 3ʹ nucleotides of primer (TSS-1, TSS; pink), complementarity to only 3ʹ nucleotide of primer (TSS; purple), or template-strand complementarity to only 5ʹ nucleotide of primer or no template-strand complementarity to primer (TSS-1 or none; white) in stationary-phase E. coli cells (left) or in vitro, with the dinucleotide primer UpA (right) (mean ± SD, N = 3).    (35) for primer-dependent initiation at TSS positions 7, 8, and 9 (corresponding to primer binding sites 6-7, 7-8, and 8-9, respectively) in stationary-phase E. coli cells for 93 natural, chromosomally-encoded promoters that use UpA as a primer. The height of each base "X" at each position "Y" represents the log2 average of the % 5ʹ-OH RNAs computed across sequences containing nontemplate-strand X at position Y. Red, consensus nucleotides; black, non-consensus nucleotides. Other symbols and colors as in Figure 1. B. Promoter-sequence dependence of primer-dependent initiation at the E. coli bhsA promoter. Top: sequences of DNA templates containing wild-type and mutant derivatives of bhsA promoter. Bottom: primer extension analysis of 5ʹ-end lengths of bhsA RNAs. In primer-dependent initiation with a dinucleotide primer, the RNA product acquires one additional nucleotide at the RNA 5ʹ end (Figure 1). Gel shows radiolabeled cDNA products derived from primer-independent initiation (5ʹ-ppp) and primer-dependent initiation (5ʹ-OH) in stationary-phase E. coli cells. Bottom right: ratios of primer-dependent initiation vs. primer-independent initiation (mean ± SD, N = 4).    Top: primer-dependent initiation involving template-strand complementarity to both 5ʹ and 3ʹ nucleotides of primer (TSS-1, TSS), template-strand complementarity to only 3ʹ nucleotide of primer (TSS), template-strand complementarity to only 5ʹ nucleotide of primer (TSS-1), or no template-strand complementarity to primer (none). Three vertical lines, complementarity; X, non-complementarity. Other symbols and colors as in Figure 1. Bottom: percentage of primer-dependent initiation involving complementarity to both 5ʹ and 3ʹ nucleotides of primer (TSS-1, TSS; pink), complementarity to only 3ʹ Figure S3. Promoter-sequence dependence of primer-dependent initiation in vitro: primer binding site Top: primer-dependent initiation involving template-strand complementarity to both 5ʹ and 3ʹ nucleotides of UpA (TSS-1, TSS), template-strand complementarity to only 3ʹ nucleotide of UpA (TSS), template-strand complementarity to only 5ʹ nucleotide of UpA (TSS-1), or no template-strand complementarity to UpA (none). Three vertical lines, complementarity; X, non-complementarity. Other symbols and colors as in Figure 1. Bottom: percentage of primer-dependent initiation involving complementarity to both 5ʹ and 3ʹ nucleotides of UpA (TSS-1, TSS; pink), complementarity to only 3ʹ nucleotide of UpA (TSS; purple), or template-strand complementarity to only 5ʹ nucleotide of UpA or no template-strand complementarity to UpA (TSS-1 or none; white) for primer binding sites located 6-7, 7-8, or 8-9 base pairs downstream of the promoter -10 element (mean ± SD, N = 3). Figure S4. Promoter-sequence dependence of primer-dependent initiation in vitro: template strand carries sequence information at positions TSS and TSS-1 A. Binding of ATP or UpA to template-strand nucleotides in primer-independent initiation (left) and primer-dependent initiation (right). Unwound transcription bubble in RPo indicated by raised and lowered nucleotides. Other symbols and colors as in Figure 1. Unwound transcription bubble in RPo indicated by raised and lowered nucleotides. Bottom: radiolabeled initial RNA products generated using the indicated template in reactions containing UpA, ATP or GTP (panel B) or UpA (panel C).

Figure S5. Promoter-sequence dependence of primer-dependent initiation in vivo: sequences flanking the primer binding site
Sequence logo (35) for primer-dependent initiation in stationary-phase E. coli cells with each of the 16 dinucleotides at TSS positions 7, 8, and 9 (corresponding to primer binding sites 6-7, 7-8, and 8-9, respectively). The height of each base "X" at each position "Y" represents the log2 average of the % 5ʹ-OH RNAs computed across sequences containing nontemplate-strand X at position Y. Red, consensus nucleotides; black, non-consensus nucleotides. Gray box indicates positions where enrichment values could not be computed.  Experimental electron density (contoured at 2.5s; green mesh) and atomic model for DNA template strand (yellow, red, blue, and orange for C, O, N, and P atoms), dinucleotide primer (green, red, blue, and orange for C, O, N, and P atoms), RNAP active-center catalytic Mg 2+ (I) (violet sphere), and RNAP bridge helix (gray ribbon). B. Contacts of RNAP residues (gray, red, and blue for C, O, and N atoms) with primer and RNAP active-center catalytic Mg 2+ (I). RNAP residues are numbered both as in T. thermophilus RNAP and as in E. coli RNAP (in parentheses).

Figure S8. Primer-dependent initiation in vivo: MASTER data analysis
A. 5'-ppp distributions (top panels) and 5'-OH distributions (bottom panels) calculated using uncorrected read counts (left) or using read counts computed using correction factors that account for inefficiencies of enzymatic processing (corrected, right). B. Relative enzymatic processing efficiencies computed for each replicate.