TY - JOUR T1 - Interactive Analytics for Very Large Scale Genomic Data JF - bioRxiv DO - 10.1101/035295 SP - 035295 AU - Cuiping Pan AU - Nicole Deflaux AU - Gregory McInnes AU - Michael Snyder AU - Jonathan Bingham AU - Somalee Datta AU - Philip Tsao Y1 - 2015/01/01 UR - http://biorxiv.org/content/early/2015/12/24/035295.abstract N2 - Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. Here we present interactive analytics using public cloud infrastructure and distributed computing database Dremel and developed according to the standards of Global Alliance for Genomics and Health, to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate that such computing paradigms can provide orders of magnitude faster turnaround for common analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. ER -