TY - JOUR T1 - PyBoost: A parallelized Python implementation of 2D boosting with hierarchies JF - bioRxiv DO - 10.1101/170803 SP - 170803 AU - Peyton G. Greenside AU - Nadine Hussami AU - Jessica Chang AU - Anshul Kundaje Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/31/170803.abstract N2 - Motivation: Gene expression is controlled by networks of transcription factors that bind specific sequence motifs in regulatory DNA elements such as promoters and enhancers. GeneClass is a boosting-based algorithm that learns gene regulatory networks from complementary paired feature sets such as transcription factor expression levels and binding motifs across conditions. This algorithm can be used to predict functional genomics measures of cell state, such as gene expression and chromatin accessibility, in different cellular conditions. We present a parallelized, Python-based implementation of GeneClass, called PyBoost, along with a novel hierarchical implementation of the algorithm, called HiBoost. HiBoost allows regulatory logic to be constrained to a hierarchical group of conditions or cell types. The software can be used to dissect differentiation cascades, time courses or other perturbation data that naturally form a hierarchy or trajectory. We demonstrate the application of PyBoost and HiBoost to learn regulators of tadpole tail regeneration and hematopoeitic stem cell differentiation and validate learned regulators through an inducible CRISPR system.Availability: The implementation is publicly available here: https://github.com/kundajelab/boosting2D/. ER -