ABSTRACT
DNA primase is an essential enzyme that synthesizes short RNA primers on specific DNA sequences. These RNA primers are elongated by DNA polymerase to form Okazaki fragments on the lagging DNA strand. It is therefore reasonable to assume that the binding of DNA primase on a genome marks the start sites of the Okazaki fragments. It has long been known that the frequency of the occurrence of primase trinucleotide recognition on a genome sequence has no influence on the size of the Okazaki fragments. The unresolved enigma that we address in this study is therefore why some, but not all, primase-DNA recognition sequences (PDRSs) become Okazaki fragment start sites. To this end, we applied machine-learning algorithms to analyze a massive amount of data obtained from protein-DNA binding microarrays (PBM) with the aim of identifying the essential elements on DNA that are needed for the binding of bacteriophage T7 primase. A PBM data learning algorithm enabled the prediction of binding values of T7 primase for any given DNA sequence with unprecedented accuracy and flexibility. On the basis of the principles learned about DNA-primase binding, we generated novel DNA sequences with improved binding of T7 primase and improved RNA primer synthesis, as validated experimentally.
Competing Interest Statement
The authors have declared no competing interest.
- ABBREVIATIONS
- PBM
- protein-DNA binding microarray
- PDRS
- primase-DNA recognition sequence
- EDA
- exploratory data analysis
- PCA
- principal component analysis
- WD
- Ward distance