EMAN2: An extensible image processing suite for electron microscopy

https://doi.org/10.1016/j.jsb.2006.05.009Get rights and content

Abstract

EMAN is a scientific image processing package with a particular focus on single particle reconstruction from transmission electron microscopy (TEM) images. It was first released in 1999, and new versions have been released typically 2–3 times each year since that time. EMAN2 has been under development for the last two years, with a completely refactored image processing library, and a wide range of features to make it much more flexible and extensible than EMAN1. The user-level programs are better documented, more straightforward to use, and written in the Python scripting language, so advanced users can modify the programs’ behavior without any recompilation. A completely rewritten 3D transformation class simplifies translation between Euler angle standards and symmetry conventions. The core C++ library has over 500 functions for image processing and associated tasks, and it is modular with introspection capabilities, so programmers can add new algorithms with minimal effort and programs can incorporate new capabilities automatically. Finally, a flexible new parallelism system has been designed to address the shortcomings in the rigid system in EMAN1.

Introduction

Electron cryomicroscopy (cryoEM) and single particle reconstruction have undergone dramatic growth over the last decade. This is due to a combination of improvements to equipment, computer processing power, and software. None of these improvements would have been sufficient on their own to produce the recent dramatic improvements in this field. It is now possible to produce reconstructions of molecules and macromolecular assemblies in the range of hundreds of kilodaltons to hundreds of megadaltons at subnanometer resolutions, for example see (Booth et al., 2004, Bottcher et al., 1997, Cheng et al., 2004, Fotin et al., 2004, Jiang et al., 2006, Ludtke et al., 2004, Matadeen et al., 1999, Zhou and Chiu, 2003). Unlike crystallography, single particle reconstruction becomes easier rather than more difficult as the size of the system increases. While tremendous strides have been made in determining the structure of individual proteins, as observed by the rapid growth of the PDB, it is clear that unlocking the secrets of the cell will require study of large systems of interacting proteins and/or RNA/DNA. The capability of producing structures from small quantities of large, fragile assemblies will continue to drive expansion of this field. Indeed, hybridizing intermediate resolution structures of assemblies from cryoEM with crystallographic reconstructions of individual components has become a standard practice in recent years, for example, see (Baker et al., 2003, Fotin et al., 2004, Gao et al., 2003, Ludtke et al., 2004, Milne et al., 2002, Zhang et al., 2003), as well as several software packages designed specifically for this task (Jiang et al., 2001, Volkmann and Hanein, 1999, Wriggers et al., 1999).

Over the last three decades, a wide range of scientific image processing packages have been developed for use with electron microscopy, including SPIDER (Frank et al., 1996), IMAGIC (van Heel et al., 1996), BSOFT (Heymann, 2001), FREALIGN (Grigorieff, 1998), EM (Hegerl, 1996), IMIRS (Liang et al., 2002), SUPRIM (Schroeter and Bretaudiere, 1996), IMOD (Kremer et al., 1996), PHOELIX (Carragher et al., 1996), PFT (Baker and Cheng, 1996), the MRC reconstruction tools (Crowther et al., 1996) and Xmipp (Sorzano et al., 2004). These packages range from full featured image processing environments to sets of tools focused on specific tasks related to a specific type of reconstruction. The very breadth of software still being developed in this field demonstrates that there are still opportunities to further optimize the techniques now in use, both in quality and computational efficiency. EMAN (Ludtke et al., 1999) was first introduced in 1999, and its popularity can be largely attributed to its suite of GUI tools, general ease of use, and its capability of performing fully CTF-corrected single particle reconstructions at high resolution with a high level of automation.

The original EMAN suite has a tiered architecture, including a scientific image processing library in C++ with partial Python bindings, a set of user-level command line applications for specific tasks written in both C++ and Python, and a set of GUI tools for various specific tasks, written in C++ using the QT toolkit. It has support for parallel processing on clusters, SMP supercomputers or sets of individual workstations. Historically, the library was originally written in Objective-C, then ported to C++, and eventually linked to the Python scripting language. This rather convoluted development process left the library with no clear organizational model for incorporating new features, either for the library or end-user programs. When we began to work with the PHENIX (Adams et al., 2004, Adams et al., 2002) project to produce SPARX (see the companion piece in this issue), it became clear that many features necessary to take the next steps towards higher resolution reconstructions and development of novel techniques could not be reasonably incorporated into the original EMAN1 design. For example, to design an extensible GUI, introspection, the ability to query software libraries for information about the functions they contain, is a critical element. In typical software libraries, including EMAN1, adding a single new function requires a time consuming recompilation of the entire software package. However, modern object-oriented design schemes permit adding new functionality while only recompiling the added code, easing the debugging process and saving hours of developer time. The process known as ‘refactoring’ involves restructuring existing code to provide the same capabilities, but with a more logical or flexible interface. Applying this process to the EMAN1 library provided the opportunity to redesign the library structure while retaining a majority of the well-tested image processing code from the original library.

The goals of the EMAN2 refactoring were to provide: easy extensibility, a complete logical Python interface, introspection capabilities for GUI integration, a properly designed documentation interface, a scheme for metadata management and built in unit-testing for reliability. For the end-user programs the aim was to remove many of the limitations present in the original EMAN1 library and to adapt knowledge developed over six years in using EMAN1 into the standard refinement processes. The EMAN2 library also provides much of the core functionality for the SPARX package (see the companion piece in this issue), though SPARX is now also integrating capabilities from other software suites to produce a flexible environment for cryoEM software development.

The philosophy behind EMAN is to provide a continuously updated set of tools representing the current state of the art in single particle reconstruction, packaged in an easy to use environment, permitting structures to be solved with a high level of confidence at the highest possible resolution. EMAN2 will eventually replace EMAN1 entirely, though EMAN2 was designed to coexist with EMAN1 to permit a gradual transition and not disrupt active research projects.

Python is a full-featured object oriented scripting language (http://www.python.org). Over recent years, it has become the de-facto standard for a wide range of scientific software packages. For structural biology, this largely began with the visualization community, first with packages like Chimera (Pettersen et al., 2004), Vision (Sanner et al., 2002) and Pymol (http://pymol.sourceforge.net) which are written largely in Python with supporting libraries in C/C++. The trend then continued such that a vast majority of scientific visualization tools now offer Python bindings in some form. Unlike strongly typed languages such as Java, which force the end user to write very rigorously designed highly structured programs, Python has a relaxed, yet powerful, structure focused on getting results quickly and flexibly. For typical scientific end-users, not interested in writing large applications, but simply writing a small script for their own use to achieve a particular result efficiently, Python is ideal. Its language structure is very easy to learn, and the fact that it has become so widely used means that a small investment in learning the basic language syntax immediately provides the user with new capabilities in a wide range of software.

While Python is flexible and full featured, it is still an interpreted scripting language, meaning its performance is substantially worse than compiled languages such as C++/Fortran. For this reason, Python is not directly used for low-level, compute-intensive tasks. Rather a set of basic functions is coded in C/C++/Fortran, then provided to Python as a callable library. For example, in EMAN2, if one wished to calculate the FFT of an image, a Python image object would be sent to a C++ FFT routine, then the result would be returned as a new Python image object. This process is, of course, transparent to the user, who simply enters ‘fftimg = img.do_fft()’. Using this design methodology, Python provides a host of advantages in writing user-level programs with a negligible impact on performance.

Section snippets

EMAN2 design

The overall design of EMAN2 is shown in Fig. 1. The suite consists of

  • C++ Core Library—The library includes over 500 high-performance image processing routines and associated classes for related operations. The library makes use of only the most widely supported C++ features to ease portability.

  • Python Bindings—The full C++ library is made available through the Python language, with a calling syntax almost identical to the syntax used from C++. From Python the library appears like any other

Conclusions

EMAN2 represents a substantial advancement over EMAN1 in extensibility, ease of use and capabilities. As described earlier, the transition from EMAN1 to EMAN2 will be gradual. The entire EMAN2 core library and Python bindings are fully functional. Basic GUI widgets exist, though they are not yet feature-complete. A number of programs have already been ported from EMAN1, and several new programs have been created. Prerelease versions of the library are updated daily, and are available from //blake.bcm.tmc.edu/EMAN2

Acknowledgments

We thank Pawel Penczek and Chao Yang for their contributions to EMAN2 via the SPARX project. Current EMAN2 development has been supported by NIH P41RR02250, P01GM064692 and P01AI055672. Continuing development is supported by R01GM080139.

References (36)

Cited by (2369)

View all citing articles on Scopus
View full text