%0 Journal Article %A Jinny X. Zhang %A John Z. Fang %A Wei Duan %A Lucia R. Wu %A Angela W. Zhang %A Neil Dalchau %A Boyan Yordanov %A Rasmus Petersen %A Andrew Phillips %A David Yu Zhang %T Predicting DNA Hybridization Kinetics from Sequence %D 2017 %R 10.1101/149427 %J bioRxiv %P 149427 %X Hybridization is a key molecular process in biology and biotechnology, but to date there is no predictive model for accurately determining hybridization rate constants based on sequence information. To approach this problem systematically, we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (subsequences of the CYCS and VEGF genes) at temperatures ranging from 28 °C to 55 °C. Next, we rationally designed 38 features computable based on sequence, each feature individually correlated with hybridization kinetics. These features are used in our implementation of a weighted neighbor voting (WNV) algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants (a.k.a. labeled instances). Automated feature selection and weighting optimization resulted in a final 6-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 2 with ≈74% accuracy and within a factor of 3 with ≈92% accuracy, based on leave-one-out cross-validation. Predictive understanding of hybridization kinetics allows more efficient design of nucleic acid probes, for example in allowing sparse hybrid-capture panels to more quickly and economically enrich desired regions from genomic DNA. %U https://www.biorxiv.org/content/biorxiv/early/2017/06/13/149427.full.pdf