Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

MRCZ – A proposed fast compressed MRC file format and direct detector normalization strategies

Robert A. McLeod, Ricardo Diogo Righetto, Andy Stewart, Henning Stahlberg
doi: https://doi.org/10.1101/116533
Robert A. McLeod
1Center for Cellular Imaging and NanoAnalytics (C-CINA), University of Basel, Basel, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: robbmcleod@gmail.com
Ricardo Diogo Righetto
1Center for Cellular Imaging and NanoAnalytics (C-CINA), University of Basel, Basel, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andy Stewart
2Department of Physics, University of Limerick, Limerick, Ireland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Henning Stahlberg
1Center for Cellular Imaging and NanoAnalytics (C-CINA), University of Basel, Basel, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The introduction of high-speed CMOS detectors is fast marching the field of transmission electron microscopy into an intersection with the computer science field of big data. Automated data pipelines to control the instrument and the initial processing steps are imposing more and more onerous requirements on data transfer and archiving. We present a proposal for expansion of the venerable MRC file format to combine integer decimation and lossless compression to reduce storage requirements and improve file read/write times by >1000 % compared to uncompressed floating-point data. The integer decimation of data necessitates application of the gain normalization and outlier pixel removal at the data destination, rather than the source. With direct electron detectors, the normalization step is typically provided by the vendor and is not open-source. We provide robustly tested normalization algorithms that perform at-least as well as vendor software. We show that the generation of hot pixels is a highly dynamic process in direct electron detectors, and that outlier pixels must be detected on a stack-by-stack basis. In comparison, the low-frequency bias features of the detectors induced by the electronics on-top of the active layer, are extremely stable with time. Therefore we introduce a stochastic-based approach to identify outlier pixels and smoothly filter them, such that the degree of correlated noise in micrograph stacks is reduced. Both a priori and a posteriori gain normalization approaches that are compatible with pipeline image processing are discussed. The a priori approach adds a gamma-correction to the gain reference, and the a posteriori approach normalized by a moving average of time-adjacent stacks, with the current stack being knocked-out, known as the KOMA (knock-out moving average) filter. The combination of outlier filter and KOMA normalization over ~25 frames can reduce the correlated noise in movies to nearly zero. Sample libraries and a command-line utility are hosted at github.com/em-MRCZ and released under the BSD license.

  • Abbreviations

    uint8
    unsigned 8-bit integer computer data, range 0–255
    float32
    floating-point 32-bit computer data, ~6 significant figures
    Blosc
    blocked, shuffle, compress library
    CDF
    cumulative density function
    CMOS
    complementary metal-oxide semiconductor
    PDF
    probability density function
    GB
    Gigabyte
    Gb
    Gigabit
    MB
    Megabyte
    KOMA
    knock-out moving average
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted March 13, 2017.
    Download PDF

    Supplementary Material

    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    MRCZ – A proposed fast compressed MRC file format and direct detector normalization strategies
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    MRCZ – A proposed fast compressed MRC file format and direct detector normalization strategies
    Robert A. McLeod, Ricardo Diogo Righetto, Andy Stewart, Henning Stahlberg
    bioRxiv 116533; doi: https://doi.org/10.1101/116533
    Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
    Citation Tools
    MRCZ – A proposed fast compressed MRC file format and direct detector normalization strategies
    Robert A. McLeod, Ricardo Diogo Righetto, Andy Stewart, Henning Stahlberg
    bioRxiv 116533; doi: https://doi.org/10.1101/116533

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (3606)
    • Biochemistry (7580)
    • Bioengineering (5529)
    • Bioinformatics (20808)
    • Biophysics (10336)
    • Cancer Biology (7986)
    • Cell Biology (11645)
    • Clinical Trials (138)
    • Developmental Biology (6611)
    • Ecology (10214)
    • Epidemiology (2065)
    • Evolutionary Biology (13626)
    • Genetics (9546)
    • Genomics (12852)
    • Immunology (7925)
    • Microbiology (19553)
    • Molecular Biology (7668)
    • Neuroscience (42128)
    • Paleontology (308)
    • Pathology (1258)
    • Pharmacology and Toxicology (2203)
    • Physiology (3268)
    • Plant Biology (7045)
    • Scientific Communication and Education (1294)
    • Synthetic Biology (1951)
    • Systems Biology (5428)
    • Zoology (1118)