PT - JOURNAL ARTICLE AU - Robert A. McLeod AU - Ricardo Diogo Righetto AU - Andy Stewart AU - Henning Stahlberg TI - MRCZ – A proposed fast compressed MRC file format and direct detector normalization strategies AID - 10.1101/116533 DP - 2017 Jan 01 TA - bioRxiv PG - 116533 4099 - http://biorxiv.org/content/early/2017/03/13/116533.short 4100 - http://biorxiv.org/content/early/2017/03/13/116533.full AB - The introduction of high-speed CMOS detectors is fast marching the field of transmission electron microscopy into an intersection with the computer science field of big data. Automated data pipelines to control the instrument and the initial processing steps are imposing more and more onerous requirements on data transfer and archiving. We present a proposal for expansion of the venerable MRC file format to combine integer decimation and lossless compression to reduce storage requirements and improve file read/write times by >1000 % compared to uncompressed floating-point data. The integer decimation of data necessitates application of the gain normalization and outlier pixel removal at the data destination, rather than the source. With direct electron detectors, the normalization step is typically provided by the vendor and is not open-source. We provide robustly tested normalization algorithms that perform at-least as well as vendor software. We show that the generation of hot pixels is a highly dynamic process in direct electron detectors, and that outlier pixels must be detected on a stack-by-stack basis. In comparison, the low-frequency bias features of the detectors induced by the electronics on-top of the active layer, are extremely stable with time. Therefore we introduce a stochastic-based approach to identify outlier pixels and smoothly filter them, such that the degree of correlated noise in micrograph stacks is reduced. Both a priori and a posteriori gain normalization approaches that are compatible with pipeline image processing are discussed. The a priori approach adds a gamma-correction to the gain reference, and the a posteriori approach normalized by a moving average of time-adjacent stacks, with the current stack being knocked-out, known as the KOMA (knock-out moving average) filter. The combination of outlier filter and KOMA normalization over ~25 frames can reduce the correlated noise in movies to nearly zero. Sample libraries and a command-line utility are hosted at github.com/em-MRCZ and released under the BSD license.uint8unsigned 8-bit integer computer data, range 0–255float32floating-point 32-bit computer data, ~6 significant figuresBloscblocked, shuffle, compress libraryCDFcumulative density functionCMOScomplementary metal-oxide semiconductorPDFprobability density functionGBGigabyteGbGigabitMBMegabyteKOMAknock-out moving average