PT - JOURNAL ARTICLE AU - Claudio Alberti AU - Tom Paridaens AU - Jan Voges AU - Daniel Naro AU - Junaid J. Ahmad AU - Massimo Ravasi AU - Daniele Renzi AU - Giorgio Zoia AU - Idoia Ochoa AU - Marco Mattavelli AU - Jaime Delgado AU - Mikel Hernaez TI - An introduction to MPEG-G, the new ISO standard for genomic information representation AID - 10.1101/426353 DP - 2018 Jan 01 TA - bioRxiv PG - 426353 4099 - http://biorxiv.org/content/early/2018/09/27/426353.short 4100 - http://biorxiv.org/content/early/2018/09/27/426353.full AB - The MPEG-G standardization project is the largest coordinated international effort to specify a compressed data format that enables large scale genomic data processing, transport and sharing. It is the first ISO/IEC standard that addresses the problems and limitations of current genomic data formats towards a truly efficient and economical handling of genomic information. It provides the means to implement leading-edge compression technology achieving more than 10x improvement over the BAM format. The standard also provides a set of currently-needed functionalities, such as selective access, application programming interfaces to the compressed data, support of data protection mechanisms, and support for streaming applications. Furthermore, ISO/IEC is also engaged in supporting the maintenance of the standard to guarantee the perenniality of applications using MPEG-G. Finally, interoperability and integration with existing genomic information processing pipelines is enabled by supporting conversion from/to the FASTQ/SAM/BAM file formats.In this paper we review the MPEG-G standard in more detail, as well as the main advantages and functionalities offered by it.