RT Journal Article SR Electronic T1 Sparse Project VCF: efficient encoding of population genotype matrices JF bioRxiv FD Cold Spring Harbor Laboratory SP 611954 DO 10.1101/611954 A1 Michael F. Lin A1 Xiaodong Bai A1 William J. Salerno A1 Jeffrey G. Reid YR 2019 UL http://biorxiv.org/content/early/2019/04/17/611954.abstract AB Summary Variant Call Format (VCF), the prevailing representation for germline genotypes in population sequencing, suffers rapid size growth as larger cohorts are sequenced and more rare variants are discovered. We present Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering ~10X size reduction for modern studies with practically minimal information loss. spVCF interoperates with VCF efficiently, including tabix-based random access.Availability and Implementation Freely available at github.com/mlin/spVCFContact dna{at}mlin.net