Abstract
The MPEG-G standardization project is the largest coordinated international effort to specify a compressed data format that enables large scale genomic data processing, transport and sharing. It is the first ISO/IEC standard that addresses the problems and limitations of current genomic data formats towards a truly efficient and economical handling of genomic information. It provides the means to implement leading-edge compression technology achieving more than 10x improvement over the BAM format. The standard also provides a set of currently-needed functionalities, such as selective access, application programming interfaces to the compressed data, support of data protection mechanisms, and support for streaming applications. Furthermore, ISO/IEC is also engaged in supporting the maintenance of the standard to guarantee the perenniality of applications using MPEG-G. Finally, interoperability and integration with existing genomic information processing pipelines is enabled by supporting conversion from/to the FASTQ/SAM/BAM file formats.
In this paper we review the MPEG-G standard in more detail, as well as the main advantages and functionalities offered by it.