Structural Insights into Regulation of Insulin Expression Involving i-Motif DNA Structures in the Insulin-Linked Polymorphic Region

The insulin linked polymorphic region (ILPR) is a variable number of tandem repeats (VNTR) region of DNA in the promoter of the insulin gene that regulates transcription of insulin. This region is known to form the alternative DNA structures, i-motifs and G-quadruplexes. Individuals have different sequence variants of VNTR repeats and although previous work investigated the effects of some variants on G-quadruplex formation, there is not a clear picture of the relationship between the sequence diversity, the DNA structures formed, and the functional effects on insulin gene expression. Here we show that different sequence variants of the ILPR form different DNA secondary structures and insulin expression is dependent on formation of i-motif and G-quadruplex structures. The first crystal structure and dynamics of an intramolecular i-motif also reveal sequences within the loop regions forming additional stabilising interactions, which are critical to formation of the stable i-motif structures that modulate insulin expression. The outcomes of this work reveal the detail in formation of stable i-motif DNA structures, with potential for rational based drug design for compounds to alter insulin gene expression.


Insulin (INS) is a protein hormone central to the regulation of glucose metabolism.
Deficiencies or incorrect production of insulin can lead to hyperglycemia and diabetes mellitus. 1,2 The insulin linked polymorphic region (ILPR) is a variable number of tandem repeats (VNTR) region of DNA in the promoter of the insulin gene that regulates transcription of insulin. 3,4 The ILPR sequence is a minisatellite located 363 bp upstream of the insulin transcription start site with heterogeneity in the number of tandemly repeated sequences observed among individuals. 4 The predominant ILPR sequence is composed of 14 base pair tandem repeats comprising of 5'-ACAGGGGTGTGGGG-3'/3'-TGTCCCCACACCCC-5'. 4 Shortening in the length of the ILPR and variants in the sequence has been linked to development of both Type-1 and Type-2 diabetes. [5][6][7][8][9] The ILPR influences both the expression of INS and insulin-like growth factor 2 and genetic variations in the ILPR are also associated with decreased expression of INS. 5,10,11 It has also been shown that insulin itself binds the Grich regions in the ILPR. 12 However, exactly how the changes in the ILPR cause alterations in expression of the INS gene remains unclear.
The ILPR is a GC-rich region of DNA. [3][4][5]13,14 The C-rich sequence can form i-motif structures, comprised of two parallel stranded hairpins zipped together by intercalated, hemiprotonated cytosine-cytosine base pairs. 15 In contrast, the G-rich sequence can fold into G-quadruplexes, which form from planar G-quartets, held together by Hoogsteen hydrogen bonding and further stabilised by via π-π stacking and the coordination of cations. 16 These types of noncanonical secondary structures have been shown to exist in cells and are prevalent within the promoter regions of genes, in particular, regions linked to regulation of gene expression and other regulatory elements. 17,18 Although G-quadruplex structures have been well studied, much less research has focussed on i-motifs, the sequences that comprise them, the corresponding structures they form and their biological functions.
Herein we characterise the different variants of the C-rich and G-rich sequences within the ILPR and determine a relationship between the stable formation of i-motif and G-quadruplex structures with corresponding insulin gene expression. The first crystal structure of an intramolecular i-motif also reveals that the sequences within the loop regions, and their additional stabilising interactions, are critical to formation of the stable i-motif structures that control insulin expression.

Characterisation of the Sequence Variants Within the ILPR
The polymorphism of the ILPR sequence extends to at least 11 main variations, with minor changes in the loops or C/G-tracts from the predominant ILPR sequence. 15 Some of these variations were shown to express different levels of insulin compared to the most prevalent ILPR sequence. 5 Some previous work had studied three of the most prevalent G-rich variants of these sequences and were able to correlate a relationship between the conformation of the G-quadruplex structure with binding affinity to insulin and insulin-like growth factor. 19 However, to fully understand the relationship between variant sequence, structure and function in the ILPR we characterised both the C-rich (ILPRC, Table 1) and the G-rich sequences (ILPRG, Table 2) of the 11 main native variants. Each tandem repeat has two tracts of guanines or cytosines (5'-ACAGGGGTGTGGGG-3'/3'-TGTCCCCACACCCC-5') so we designed our sequences to have two repeats, which would provide the minimum sequence necessary for imotif or G-quadruplex formation. We maintained flanking sequences either side of the terminal C/G-tracts in line with the tandem repeat sequence. The general sequence for each variant was Flank-(C/G-tract)-Loop1-(C/G-tract)-Loop2-(C/G-tract)-Loop3-(C/G-tract)-Flank.
Each C-rich and G-rich variant was characterised using circular dichroism (CD) to determine the overall topology, thermal difference spectroscopy (TDS) to characterise the type of structure in solution, and UV melting/annealing experiments to determine the thermal stability. For the C-rich sequences, the transitional pH was also determined by CD, to allow comparison of the pH stability of the sequence variants.
The C-rich ILPR sequence variants were characterised in 10 mM sodium cacodylate buffer with 100 mM KCl. CD spectroscopy was performed at a range of pHs between 4 and 8 for determination of the transitional pH. TDS and UV melting and annealing experiments were performed at pH 5.5 to allow for assessment of the relative stability of all sequences, even those that may not be stable at physiologically relevant pH. We considered this pH would allow for more stable and less stable variants to be characterised fully and allow all sequences to be compared alongside each other. A summary of sequence, the melting temperature (Tm), annealing temperature (Ta), the thermal hysteresis (Th), transitional pH, and structural assignment by TDS are provided in Table 1. The corresponding example data is provided in the supporting information (Figures S1-5). Table 1: Sequences of the C-rich ILPR (ILPRC) variants -loops are underlined, mutations are in bold, and incomplete C-tracts are shown in red. Melting, annealing and hysteresis temperatures, transitional pH, and structure characterisation from the CD and thermal difference spectra (TDS). Data shown as mean ± SD (n= 3). Nt = no transition observed.  The predominant ILPR C-rich variant (1C) gave clear UV melting and annealing traces ( Figure   S1) and a Tm of 55 ± 1.7°C (Table 1). This melting temperature is higher than that measured by others for the same sequence, but the previous work was performed in phosphate buffer, which displays reduced buffering capacity at elevated temperatures. 20 The TDS of sequence 1C showed positive peaks at 240 and 265 nm and a negative peak at 295 nm, consistent with i-motif structure ( Figure 1A). 21 Similarly, CD spectroscopy of variant 1C showed i-motif formation at acidic pH, indicated by a positive peak at 288 nm and a negative peak at 260 nm ( Figure 1B and S2). 22 As the pH increases towards pH 7, the conformation unfolds, the positive peak shifts to 273 nm and the negative peak to 250 nm ( Figure S2A). The transitional pH of 1C was determined to be 6.5 (Table 1 and Figure S2F), which was in-line with previous experiments for this sequence variant. 23 One ILPRC variant (4C) demonstrated the same stability as sequence 1C, with a transitional pH of 6.7 (Table 1, Figure S2I) but the other variants were all significantly less pH stable, with transitional pHs as low as 4.7 (sequence variants 10C and 11C, Table 1 and Figure   S4). Interestingly, minor differences in sequence made significant changes in the stability of the structures formed. For example, in ILPRC variant 2C, a single C to G mutation in each of the tandem repeats results in significantly lower melting (47 ± 0.4°C compared to 55 ± 1.7°C) and annealing temperatures (p<0.0001) and also a lower transitional pH of 5.1 (p<0.0001). It appears to be a hairpin-like structure at pH 5.5 in the TDS analysis ( Figure 1A) and from the CD spectra at pH 5.5 ( Figure 1B). I.e., this C to G mutation prevents i-motif formation.
Two main factors appear to decrease the stability of i-motif structure in these variants: mutation of loop nucleotides from cytosine to guanine (variants 6C-11C -highlighted in bold, in Table 1) or mutation/truncation within the C-tracts (variants 2C, 3C, 5C, 7C and 10Chighlighted in red, in Table 1). Some sequences are affected by both these factors (variants 2C, 7C and 10C, Table 1) and have some of the lowest transitional pHs overall (pHTs of 5.2, 5.4 and 4.7, respectively). The C to G mutation is clearly critical as it not only removes cytosines from the core stack of base pairs, but also introduces potential competing Watson/Crick complementary nucleotide which can shift the conformational equilibrium towards hairpin formation. This is further supported by the data acquired for sequences 10C and 11C which have more guanines in the loops. Both of these sequences did not give any transitions in the UV melting/annealing experiments at 295 nm ( Figure S1D and S1L), but did so at 260 nm ( Figure S1H and S1P), indicative of hairpin/duplex formation. TDS also indicated a spectrum inconsistent with i-motif and more consistent with that of duplex ( Figure 1A and Figure S5), suggestive that these sequences in particular, form only hairpins under these experimental conditions. In silico structural calculations of these sequences using M-fold, 24 also show clear potential for these sequences to form into hairpins ( Figure S6). It is noteworthy that these sequences also demonstrate a larger hysteresis (6°C) compared to the other ILPRC variants (generally 2-3°C), indicative of slower kinetics in the formation of these hairpin structures.
Given the vast differences in structures and stability of the C-rich sequences, we wanted to also examine the complementary G-rich sequences, to see whether there was any complementarity in terms of the structures formed.
The G-rich ILPR sequence variants were also characterised by CD in 10 mM sodium cacodylate buffer, pH with 100 mM of KCl, NaCl or LiCl to give an indication of cation preference typically observed in G-quadruplex forming sequences. Sequences were characterised by UV melting/annealing in analogous buffer except with 20 mM KCl (100 mM concentrations of KCl resulted in Tm values >95°C). A summary of the characterisation of the sequences in KCl cation conditions: the melting temperature (Tm), annealing temperature (Ta), the hysteresis (Th), QGRS mapper score, 25 and structural assignment by CD and TDS are provided in Table 2. The corresponding example data is provided in the supporting information ( Figures S7-9). The predominant ILPR G-rich variant (1G) gave clear UV melting and annealing traces ( Figure   S7A) and a Tm of 73 ± 0.6°C (Table 2). This melting temperature is similar to that measured previously (~78°C) for the same sequence but in Tris buffer. 26 The TDS of sequence 1G showed several positive peaks at 240, 255 and 270 nm and a negative peak at 295 nm, consistent with G-quadruplex structure 21 ( Figure 1C). Variant 1G gave CD spectra with a negative peak at 245 nm, and positive peaks at 263 nm and 295 nm ( Figure 1D). This is in line with previously published CD spectroscopy data, showing a mixed population of parallel and antiparallel Gquadruplex formation in presence of KCl and a shift towards antiparallel G4 formation in the presence of weaker stabilizing cations NaCl and LiCl ( Figure S8A). 14, 26,27 Of the other G-rich sequence variants, 4G (Tm = 74 ± 0.6°C) had similar thermal stability compared to 1G (73 ± 0.6°C), showing the mutation in G-tract Loop2 from a C to a T makes little difference in the stability ( Figure S7B and S7F). ILPRG variant 8G was more stable (77 ± 0.6°C) than 1G, but does present as a potential mixture of species by CD and TDS ( Figure S7K, S8G, and S9). The most stable of all the variants was sequence 6G (84 ± 1.1°C). 1G, 4G and 6G were all clearly characterized as G-quadruplexes by CD and TDS in KCl cationic conditions, similar to previously described studies on these sequences ( Figure S8 and S9). 19,27 Some sequences formed significantly weaker secondary DNA structures and were thermally less stable than these four strong G-quadruplexes. For example, 2G, which is the reverse complement of 2C, has a significantly lower melting (Tm = 55 ± 1.2°C compared to the 1G variant with a Tm of 73 ± 0.6°C) and annealing temperatures (p<0.0001). This sequence (2G) also presents as a mixture of G-quadruplex and a hairpin-like structure in the TDS analysis ( Figure 1C) and the CD spectra shows a broad weak positive peak at 300 nm and a negative peak at 245 nm ( Figure 1D). This is in line with a formation of a weak antiparallel Gquadruplex, and potentially mixed with some sort of hairpin/duplex. 22 The fact that there is a melting transition in the UV at 295 nm ( Figure S7I) is potentially indicative of the former DNA structure, however, a negative peak at 295 nm in the TDS may present with Z-DNA and Hoogsteen DNA as well as G-quadruplex and i-motif structures. 22 Interestingly, some of the sequences (5G, 7G, 9G, 10G, and 11G) do not have UV melt and anneal profiles at 295 nm, as expected with G-quadruplex structures. However, variants 7G, 9G, 10G, and 11G have clear melting and annealing transitions at 260 nm, consistent with these sequences forming hairpins or duplex-like structures ( Figure S7). 28 For example, the 10G variant, is similar to 2G in the TDS signature ( Figure 1C), consistent with hairpin formation and clearly different to that of the G-quadruplexes formed by 1G, 4G, and 6G ( Figure S8). The CD spectrum of 10G shows only a very weak positive signal at 260 nm and a negative signal at 240 nm ( Figure 1D), which is consistent with unfolded G-rich sequence or a very weak G-quadruplex. Notably, the hairpin forming variants also lost cation sensitivity in the CD spectra and all have a narrow dip in signal at 215 nm ( Figure S8). M-fold predictions of all tested variants show clear potential for these sequences to form into hairpins ( Figure S10).
These results indicate that although particular ILPR variants are capable of forming i-motif and G-quadruplex structures, not all of them do. Comparing the biophysical data with the Gscores from QGRS mapper 25 ( Functioning β-Cells normally secrete insulin in response to increased blood glucose levels as part of the homeostasis of blood glucose. There are many cell line models which can be used to assess levels of insulin expression in vitro. These cells retain normal regulation of glucoseinduced insulin secretion, allowing use of glucose as a positive control. 29 We selected the rat insulinoma-derived cell line INS-1 as model system due to the lack of an intrinsic ILPR or analogous sequence. 30

Determination of the First Intramolecular i-Motif Crystal Structure
Given the apparent importance of the formation of i-motif and G-quadruplex structures in controlling relative insulin expression, we were interested in the potential interactions within the loops that made certain variants more stable than others. have been shown to alter the widths of the grooves in the structure. 33 There are currently twelve intermolecular i-motif crystal structures formed from two or four separate strands but no intramolecular topologies. The apparent reason for the lack of intramolecular crystal structures is mainly due to the fact that i-motif loops are highly dynamic and difficult to resolve successfully using crystallographic methods. Intramolecular i-motif crystal structures would provide much opportunity for rational design of compounds to target these structures, and potential for drug development against these interesting biological targets, complementing recent drug discover projects targeting G-quadruplex.
With this in mind, we wanted to give the best chance for successful crystallisation, so we  Table S1). The crystallisations were performed at a pH below the pHT (5.5), at which this sequence would be most stable.
Crystals of all three variants were obtained by hanging drop methods (Table S2) (Tables S3 and S4). In the 4C-Br sequence (Table S1) scatterers. 34 We are currently exploring ways to optimise the P-signal for P-SAD applications.

Structural Description of an Intramolecular i-Motif from the ILPR
The crystal structure formed from the ILPRC sequence 4C is comprised of two, independent, and inverted i-motifs in the asymmetric unit ( Figure 3A and 3B). Each of these individual imotifs is formed from four antiparallel strands held together by eight, intercalated hemiprotonated cytosine-cytosine base pairs, connected by three loops (Figure 3). The ACA loops connect strands at the minor grooves and the middle TAT loop at the major groove ( Figure   3C, Table S5). The terminal CC-base pair is at the 3′-end, making each structure a 3′E topology ( Figure 3B). 35 Apart from the CC base pairing, other interactions within each strand include mismatched base pairs like AA and TT which could contribute to the overall stability of the folded construct ( Figure 3D, Table S6). In strand A, there is an AA base pair between loop 1 and loop 3, A22  Table   S7). Also, in strand B the AA base pair is formed between A22 (loop 3) and A8 rather than with A10 (loop 1) as in strand A.
Strand B is similar to strand A ( Figure 3F) with an RMSD of 2.32 Å (when flanks are excludednucleotides 4 to 28). A difference is that the A16 is displaced with a symmetry-related adenine, but still stacks on top of the TT base pair. Also, A8 displaces A10 in the interaction with A22 which allows A10 to interact with a symmetry related thymine ( Figure 3D, Figure S13). When only the core is included in the calculation, the RMSD is 1.04 Å showing the high similarity between the two cores. Differences in the torsion angles and sugar puckers of the two strands are shown in Table S8 and Figure S15 and are attributed to the phosphate backbone flexibility.
As there are two i-motifs in the asymmetric unit, this gives an excellent view as to how more than one i-motif may interact with each other like "beads on a string". There are clear interactions between flank 1 of one strand (A2) and loop 3 (A24) of the other strand ( Figure   3E). Also, there are various π-π stacking interactions between the outer nucleotides of the TAT flanks and an A or T in the loops which highlight the importance of the flanks in the crystal packing (Table S7). Intermolecular TA and CC pairs further contribute to the crystal packing ( Figure S13, S14). Given the ILPR is a polymorphic region within the genome, comprised of tandem repeats, these intermolecular interactions are potentially important for consideration with how ligands and nuclear proteins may interact with these structures.
No specific hydration pattern was observed at the middle of the CC core as most of the cytosine hydrogen bond donors and acceptors are used in the formation of the CC pairs. Some waters at the major groove were seen hydrogen bonded with the H of N4 of the cytosine which is not involved in the CC base pairing and we observe a bridging with the phosphate O atoms. This is in agreement with some of the other tetramolecular or bimolecular i-motif crystal structures published e.g., 1CN0, 33 1BQJ, 33 8DHC, 8CXF, 36 but no bifurcated hydrogen bond to O2 of a cytosine partner was seen. Based on the use of the Fanom(calc) maps ( Figure   S11) we can more confidently describe these peaks as water molecules and exclude sodium or chloride ions. Although limited by resolution, we do observe water molecules in the loops and these could represent potential sites for hydrogen bond interactions, something which will be useful in future ligand design or interactions with proteins. Given the potential binding pocket revealed by A16, which in strand A is base paired with T15 and in strand B this adenine was displaced with a symmetry-related adenine, this indicated to us that this site may also be interesting for potential targeting with ligands.

Stabilising TT Base Pairs are Observed in Solution
Given the interesting additional base pairs within the crystal structure, we were interested whether these could be observed also in solution, so we performed NMR spectroscopy to examine the imino proton region. The imino proton region shows a set of peaks between 15.4 and 15.8 ppm, consistent with the presence of hemi-protonated cytosines ( Figure S16). 15 There are also additional imino proton signals at 10.9 and 11.5 ppm, consistent with the presence of TT base pairs. 37 Importantly, there are no signals in the region between 12.5 and 14 ppm, where the imino proton signals from GC and AT base pairing would be expected. 37,38 An NMR annealing experiment (from 333 K to 277 K) revealed the formation of the CC base pairs first at 319 K, followed by the TT base pairs at 312 K ( Figure S17). The evidence from the NMR indicates that the structure in solution is similar to that in the crystal structure, and the TT base pairs are weaker than the CC base pairs. A recent study looking at i-motifs using a DNA microarray containing 10,976 genomic i-motif forming sequences found that i-motifs with shorter loops (n = 1-4) had enhanced stability when the sequences had thymine residues directly flanking C-tracts. 39 The presence of the TT base pairs in both the NMR experiments and the crystal structure provides structural evidence for the reason why this is the case.

Enhanced Sampling Molecular Dynamics
To further explore the conformational landscape of i-motifs, we performed enhanced sampling molecular dynamics simulations. Of particular interest to us were the loops regions, Information, Figures S18-25 and Table S9). Upon creation of these models, both strands present a free energy landscape consisting of multiple metastable states that also explore the crystallographic conformations.
Given the interactions observed in the crystal structure originating from the flanking sequence, we looked at sequence 4C (TATCCCCACACCCCTATCCCCACACCCCTAT) and also an analogue with one base missing at the 3'-end, 4Cdel (TATCCCCACACCCCTATCCCCACACCCCTA).
Our analysis suggests that 4Cdel is far more dynamic with multiple interactions compared to 4C. Upon inspection of the structures extracted from the coarse-grained models, 4Cdel featured far more unstructured conformations in loops 1 and 3, while those in 4C seemed fairly ordered. Loop 2 in both sequences was well ordered. This would seem to suggest that slow motions in the i-motif structure are largely as a result of stabilising and compensating for the flexibility in not only loop 2 but also in the regions that flank it, namely the 5' and 3' ends. This can be visualised in the dynamics of 4C and 4Cdel.
The 4C structure is longer than 4Cdel by one base (T) at the 3' end. This extra nucleotide leads to significantly more structural ordering via -stacking interactions within the 3' end itself, which then leads to the ordered conformations observed in loop 2. Comparing this with 4Cdel, the additional stacking is not possible and therefore the interactions with loop 2 produces a greater number of metastable states. Since time independent components (tICs) are ordered from slowest to fastest in terms of motions, those that provide the most stability will be ordered highest than those that are faster. But still a significant number may not be fully

Oligonucleotides
All tested DNA sequences were synthesised and reverse phase HPLC purified by Eurogentec (Belgium) and prepared to a final concentration of 1 mM (biophysics) or 1.5 mM (crystallography) in ultra-pure water and confirmed using a Nanodrop. Each experimental section states the DNA concentration and buffer system in which they were prepared.
Biophysical samples were annealed by heating for 5 mins at 95°C in a heating block and allowed to anneal by slow cooling to room temperature overnight. Crystallography samples were annealed using a thermocycler/PCR machine. The temperature was held at 60°C for 5 minutes, above the melting temperature for the sequences, followed by cooling to 51°C at a rate 1 °C/min and then to 4 °C with a rate 1°C/3 mins. A slower cooling rate of 1°C/3 mins was ideal to allow the sequences to fold and avoid gel formation which was observed when higher rates where used.

Circular Dichroism Spectroscopy
The at a rate of 0.5°C/min (melting). When the temperature reached 95°C, it was held for 10 min before the process was reversed (annealing). The average melting (Tm) and annealing temperature (Ta) were identified by the first derivative method of for each measured cycle.
The thermal difference spectra (TDS) was obtained by measuring the absorbance spectrum from 230 nm to 320 nm after 10 mins at 4°C for the folded DNA structure and after 10 mins at 95°C for the unfolded structure. The TDS signature is determined by subtracting the absorbance spectra of the folded structure from the unfolded structure, zero corrected at 320 nm, and then normalised to the maximum absorbance.

Crystallography Preparation of materials
The DNA sequences used in the crystallisations are provided in Table S1. Bromination in the 4C-Br sequence was on carbon-5 of the Cytosine4 and carbon-8 for the Adenine16. As we have previously shown, substitutions at these positions do not disrupt the folded i-motif topologies. 40 Oligonucleotides were annealed in the presence of 10 mM sodium cacodylate, 18 mM NaCl at pH 5.5.

Crystallisations
Crystallisations were achieved using the hanging-drop vapor diffusion method. two weeks at 10°C. It was still possible to grow good quality 4C crystals at 4°C but took longer.
4C-Br, 4Ca and 4Cb crystals were grown at 10 °C using the conditions summarized in Table S2.

Data collection, phasing, and structure determination
Crystals were harvested using loops, placed in oil to remove mother liquor, and then cooled in liquid nitrogen. All diffraction data were collected at the Diamond Light Source synchrotron (DLS), UK.
P-SAD data for the 4C sequence were collected at the long wavelength I23 beamline in vacuum. 41 The wavelength was tuned below the P-absorption edge (λ = 5.7788 Å, f''4) to eliminate absorption. The maximum wavelength used during this experiment was 3.9995 Å, f''2.3 to reduce absorption. Our data collections were undertaken at low-dose levels to prevent radiation damage and improve merging of multiple datasets. Good quality P-SAD data were collected but still resulted in a low P-anomalous signal as shown in Table S3. The signal from the intrinsic P atoms alone proved insufficient for phasing and structural determination, likely due to the low number of unique reflections compared to the number of anomalous scatterers. 34 Based on data quality indicators, resolution range, completeness, I / σ Ι and Rmeas, the data collected at λ = 2.4797Å, f''1.0 was selected consisting of six datasets merged to to increase the resolution and thus the number of unique reflections. Data were reduced using xia2.multiplex and subsequently re-scaled to 2.25 Å resolution with Aimless. 42,43 All data collection details are summarized in Table S3.
Data for the 4C-Br sequence were collected at the I03 beamline. A Br-MAD experiment was conducted at three wavelengths: 0.9196 Å (peak), 0.9203 Å (inflection point) and 0.9117 Å (remote). Data was reduced and scaled as described above. Data collection details are summarised in Table S4. The Shelx pipeline was used to determine the positions of the Br atoms and phase information. 44 These phases were then used with the merged P-SAD dataset of 4C described above. Cycles of model building and refinement were then performed using COOT, 45 and REFMAC5 (CCP4i package) 46  Cell Culture

INS-1 rat insulinoma cells (AddexBio) were cultured in RPMI-1640 medium supplemented
with 10% FBS and 50 µM 2-mercaptoethanol (BME), and 1% penicillin-streptomycin which were all obtained from Gibco. The medium was changed every four days and cells were expanded when reaching 80% confluency. Experiments were carried out in cells between passage 5-9. Sodium-pyruvate, 50 µM BME and 2.8 mM Glucose (all additives were obtained from Gibco).

INS
The overnight starved cells were treated with fresh low glucose medium or high glucose medium (16.2 mM). The four different ILPR variants were treated for 4h. The Dual Luciferase assay (Promega) was performed according to instruction manual and measured luminescence signals on SpectraMax iD3 (Molecular Devices). The resulting Firefly signal was corrected with the corresponding Renilla signal and final data was normalised to the Firefly/Renilla ratio of the low glucose signal to account for technical variation while preserve biological variation among 12 experimental repeats. 50 Data analysis and presentation were performed using GraphPad Prism version 9.0. All sets of data passed all available normal distribution tests available in GraphPad Prism and presented as Mean ± SEM with indicated sample sizes (n). The statistical difference between treatments or variants was determined by one-way ANOVA and corrected with Holm-Sidak posthoc analysis.

NMR
NMR data was recorded for the 4C ILPRC sequence. The DNA concentration was 0.66 mM in 9.1 mM sodium cacodylate buffer at pH 5.5, 91 mM KCl and 17% D2O. NMR data were acquired using a 700 MHz Bruker Avance III NMR spectrometer equipped with a TCI cryoprobe operating Topspin 3.6.2. 1H 1D spectra were acquired using a perfect echo watergate experiment, 51 with a 1 s recycle delay, a 35 μs delay for binomial water suppression, a 780 ms acquisition time and a 30 ppm spectral width, and chemical shifts were referenced to the solvent. Data were processed with exponential window functions using nmrPipe. 52

Enhanced Sampling Molecular Dynamics
The initial ILPR i-motif crystal structure consists of two near-identical motifs packed into a dimer as a result of crystallisation. These two i-motifs were then separated into 4C (TATCCCCACACCCCTATCCCCACACCCCTAT) and a modified shortened variant 4Cdel, which is missing the terminal T: (TATCCCCACACCCCTATCCCCACACCCCTA). Adaptive Bandit simulations were run using the ACEMD molecular dynamics engine. 53 Full details of the protocol are listed in the supplementary information. The simulation protocol was identical for both strands. The Markov State Models were built using the PyEMMA software. 54