TY - JOUR T1 - The GCTx format and cmap{Py, R, M} packages: resources for the optimized storage and integrated traversal of dense matrices of data and annotations JF - bioRxiv DO - 10.1101/227041 SP - 227041 AU - Oana M. Enache AU - David L. Lahr AU - Ted E. Natoli AU - Lev Litichevskiy AU - David Wadden AU - Corey Flynn AU - Joshua Gould AU - Jacob K. Asiedu AU - Rajiv Narayan AU - Aravind Subramanian Y1 - 2018/01/01 UR - http://biorxiv.org/content/early/2018/01/03/227041.abstract N2 - Motivation Computational analysis of datasets generated by treating cells with pharmacological and genetic perturbagens has proven useful for the discovery of functional relationships. Facilitated by technological improvements, perturbational datasets have grown in recent years to include millions of experiments. While initial studies, such as our work on Connectivity Map, used gene expression readouts, recent studies from the NIH LINCS consortium have expanded to a more diverse set of molecular readouts, including proteomic and cell morphological signatures. Sharing these diverse data creates many opportunities for research and discovery, but the unprecedented size of data generated and the complex metadata associated with experiments have also created fundamental technical challenges regarding data storage and cross-assay integration.Results We present the GCTx file format and a suite of open-source packages for the efficient storage, serialization, and analysis of dense two-dimensional matrices. The utility of this format is not just theoretical; we have extensively used the format in the Connectivity Map to assemble and share massive data sets comprising 1.7 million experiments. We anticipate that the generalizability of the GCTx format, paired with code libraries that we provide, will stimulate wider adoption and lower barriers for integrated cross-assay analysis and algorithm development.Availability Software packages (available in Matlab, Python, and R) are freely available at https://github.com/cmapSupplementary information Supplementary information is available at clue.io/code.Contact oana{at}broadinstitute.org ER -