TY - JOUR T1 - Choosing panels of genomics assays using submodular optimization JF - bioRxiv DO - 10.1101/036137 SP - 036137 AU - Kai Wei AU - Maxwell W. Libbrecht AU - Jeffrey A. Bilmes AU - William Stafford Noble Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/01/07/036137.abstract N2 - Genomic sequencing assays such as ChIP-seq and DNase-seq can measure a wide variety of types of genomic activity, but the high cost of sequencing means that a panel of at most 3–10 assays is usually performed on each cell type. Therefore, the choice of which assay types to perform is a crucial step in any genomics project. We present submodular selection of assays (SSA), a method for choosing a diverse panel of genomic assays based on the observed pattern of correlations in existing assays. The method optimizes over submodular functions, which are discrete set functions that have properties analogous to certain continuous convex functions. SSA is computationally efficient, extremely flexible, and is theoretically optimal under certain assumptions. We find that SSA chooses panels of assay types that measure diverse activities, in one case nearly exactly replicating the panel selection choice made by the Roadmap Epigenomics consortium. To quantitatively evaluate SSA, we present a framework for evaluating the quality of a panel of assay types based on three common applications of genomics data sets: imputing assays that have not been performed, locating functional elements such as promoters and enhancers, and annotating the genome using a semi-automated method. Using this framework, we find that panels chosen by SSA perform better than alternative strategies. We therefore expect that SSA will replace manual selection as the first step of future genomics projects. In addition, this application may serve as a model for how submodular optimization can be applied to other discrete problems in biology. ER -