PT - JOURNAL ARTICLE AU - Adam Soffer AU - Morya Ifrach AU - Stefan Ilic AU - Ariel Afek AU - Dan Vilenchik AU - Barak Akabayov TI - Reconfiguring Okazaki fragment start sites on a genome by using a data-driven approach AID - 10.1101/2020.09.29.317842 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.09.29.317842 4099 - http://biorxiv.org/content/early/2020/09/30/2020.09.29.317842.short 4100 - http://biorxiv.org/content/early/2020/09/30/2020.09.29.317842.full AB - DNA primase is an essential enzyme that synthesizes short RNA primers on specific DNA sequences. These RNA primers are elongated by DNA polymerase to form Okazaki fragments on the lagging DNA strand. It is therefore reasonable to assume that the binding of DNA primase on a genome marks the start sites of the Okazaki fragments. It has long been known that the frequency of the occurrence of primase trinucleotide recognition on a genome sequence has no influence on the size of the Okazaki fragments. The unresolved enigma that we address in this study is therefore why some, but not all, primase-DNA recognition sequences (PDRSs) become Okazaki fragment start sites. To this end, we applied machine-learning algorithms to analyze a massive amount of data obtained from protein-DNA binding microarrays (PBM) with the aim of identifying the essential elements on DNA that are needed for the binding of bacteriophage T7 primase. A PBM data learning algorithm enabled the prediction of binding values of T7 primase for any given DNA sequence with unprecedented accuracy and flexibility. On the basis of the principles learned about DNA-primase binding, we generated novel DNA sequences with improved binding of T7 primase and improved RNA primer synthesis, as validated experimentally.Competing Interest StatementThe authors have declared no competing interest.ABBREVIATIONSPBMprotein-DNA binding microarrayPDRSprimase-DNA recognition sequenceEDAexploratory data analysisPCAprincipal component analysisWDWard distance