PT - JOURNAL ARTICLE AU - Javier Quilez AU - Enrique Vidal AU - François Le Dily AU - François Serra AU - Yasmina Cuartero AU - Ralph Stadhouders AU - Guillaume Filion AU - Thomas Graf AU - Marc A. Marti-Renom AU - Miguel Beato TI - Managing the analysis of high-throughput sequencing data AID - 10.1101/136358 DP - 2017 Jan 01 TA - bioRxiv PG - 136358 4099 - http://biorxiv.org/content/early/2017/05/10/136358.short 4100 - http://biorxiv.org/content/early/2017/05/10/136358.full AB - In the last decade we have witnessed a tremendous rise in sequencing throughput as well as an increasing number of genomic assays based on high-throughput sequencing (HTS). As a result, the management and analysis of the growing amount of sequencing data present several challenges with consequences for the cost, the quality and the reproducibility of research. Most common issues include poor description and ambiguous identification of samples, lack of a systematic data organization, absence of automated analysis pipelines and lack of tools aiding the interpretation of the results. To address these problems we suggest to structure HTS data management by automating the quality control of the raw data, establishing metadata collection and sample identification systems and organizing the HTS data with an human-friendly hierarchy. We insist on reducing metadata field entries to multiple choices instead of free text, and on implementing a future-proof organization of the data on storage. These actions further enable the automation of the analysis and the deployment of web applications to facilitate data interpretation. Finally, a comprehensive documentation of the procedures applied to HTS data is fundamental for reproducibility. To illustrate how these recommendations can be implemented we present a didactic dataset. This work seeks to clearly define a set of best-practices for managing the analysis of HTS data and provides a quick start guide for implementing them into any sequencing project.