PT - JOURNAL ARTICLE AU - Lyon, Matthew AU - Andrews, Shea J AU - Elsworth, Ben AU - Gaunt, Tom R AU - Hemani, Gibran AU - Marcora, Edoardo TI - The variant call format provides efficient and robust storage of GWAS summary statistics AID - 10.1101/2020.05.29.115824 DP - 2020 Jan 01 TA - bioRxiv PG - 2020.05.29.115824 4099 - http://biorxiv.org/content/early/2020/05/30/2020.05.29.115824.short 4100 - http://biorxiv.org/content/early/2020/05/30/2020.05.29.115824.full AB - Genome-wide association study (GWAS) summary statistics are a fundamental resource for a variety of research applications 1–6. Yet despite their widespread utility, no common storage format has been widely adopted, hindering tool development and data sharing, analysis and integration. Existing tabular formats 7,8 often ambiguously or incompletely store information about genetic variants and their associations, and also lack essential metadata increasing the possibility of errors in data interpretation and post-GWAS analyses. Additionally, data in these formats are typically not indexed, requiring the whole file to be read which is computationally inefficient. To address these issues, we propose an adaptation of the variant call format9 (GWAS-VCF) and have produced a suite of open-source tools for using this format in downstream analyses. Simulation studies determine GWAS-VCF is 9-46x faster than tabular alternatives when extracting variant(s) by genomic position. Our results demonstrate the GWAS-VCF provides a robust and performant solution for sharing, analysis and integration of GWAS data. We provide open access to over 10,000 complete GWAS summary datasets converted to this format (available from: https://gwas.mrcieu.ac.uk).Competing Interest StatementTRG receives funding from GlaxoSmithKline and Biogen for unrelated research.