ABSTRACT
The capacity to map traits over large cohorts of individuals – phenomics – lags far behind the explosive development in genomics. For microbes the estimation of growth is the key phenotype. We introduce an automated microbial phenomics framework that delivers accurate and highly resolved growth phenotypes at an unprecedented scale. Advancements were achieved through introduction of transmissive scanning hardware and software technology, frequent acquisition of precise colony population size measurements, extraction of population growth rates from growth curves and removal of spatial bias by reference-surface normalization. Our prototype arrangement automatically records and analyses 100,000 experiments in parallel. We demonstrate the power of the approach by extending and nuancing the known salt defence biology in baker’s yeast. The introduced framework will have a transformative impact by providing high-quality microbial phenomics data for extensive cohorts of individuals and generating well-populated and standardized phenomics databases.
INTRODUCTION
While our ability to detect genetic variation has improved tremendously (Koboldt et al. 2013; Mardis 2013), our capacity to rapidly and accurately map the phenotypic effects of this variation in large population cohorts – phenomics – has not made comparable gains (Houle et al. 2010). This is true also for microbes, where estimation of fitness components defines the key entrance point when searching for the concerted effects of causative genetic variation. As the majority of genetic or environmental effects on net fitness are modest in size (Thatcher et al. 1998; Wagner 2000), this demands access to measurement methodology capable of capturing subtle differences.
Micro-colony analysis estimates within-population variations (Levy et al. 2012) but is low throughput for differences between populations. Miniaturization of liquid cultures allows automation and highly accurate measurement of population density (Warringer and Blomberg 2003, 2014) but its associated costs in time and manpower are too high to allow massive scale-up. Tagging of strains with DNA sequence barcodes and monitoring tag frequencies in complex strain mixtures allows high throughput but at lower accuracy and at high initial investments (Giaever et al. 2002; Winzeler et al. 1999). Better cost-efficiency can be achieved using ordered arrays of microbes that are cultivated on solid media as colonies whose area sizes are estimated using cameras (Costanzo et al. 2010; Kvitek et al. 2008; Bean et al. 2014; Lawless et al. 2010). The set-up is attractive in that microbes are surveyed in a growth mode that better resembles their natural state, but the environment can rarely be maintained constant across plates. This leads to large spatial biases and false results. Spatial bias correction is a formidable challenge (Baryshnikova et al. 2010) because colonies exchange nutrients and toxins with the local environment and affect each other. Accentuating problems, 3-dimensional colony population size is typically approximated by measuring the often poorly correlated 2-dimensional colony area (Pipe and Grimson 2008). Furthermore, growth tends to be estimated from a single or few measures (Tong et al. 2001; Tong et al. 2004). As any particular colony size can be reached via an endless number of very different growth paths (Fig 1A), this unavoidably creates false positives and negatives as well as wrong conclusions.
A) Single time point measures at different stages of growth (vertical broken lines) provide wildly diverging views of the relative growth performance of strains, alternatively scoring all genotypes as identical (t2), genotype 1 as superior (t1) and genotype 4 as superior (t3). This illustrates how effects on distinct aspects of the cellular physiology of different genotypes can be falsely called in the absence of dynamic growth data. B) Scan-o-matic is physically based on high quality desktop scanners and fixation of four solid media plates on top of the scanning surface using custom-made acrylic glass fixtures. C) Four solid media plates – each containing 96 to 1536 microbial colonies – are fixed on each scanner by the glass fixture. Three black & white orientation markers (red arrows) on the fixture allow precise tracking of pixel positions and a fixed transmissive grey scale calibration strip (blue arrow) permits between-scans and scanners calibration of pixel intensities. D) Colony population size is extracted from raw images in a multi-step procedure, proceeding from raw image, to probable colony (blob; blue) detection and segmentation, local background definition (blue) with a safety margin to colony, and estimation of cells as pixel intensity as compared to background. Colour intensity = 0 (dark blue) – 1500 (turquoise) cells per pixel. E) Colony population growth curves obtained by cultivating genetically identical WT colonies in four environmental contexts and measuring colony population size in 20 minute intervals. Y-axis is on log(2) scale.
The near-term goal of microbial phenomics is therefore to achieve the accuracy of liquid microcultivation in a solid media cultivation mode. To reach this goal the 3-dimensional topology of microbial colonies must be captured at high spatial and temporal resolution and reduced into accurate estimates of colony population sizes while exhaustively compensating for local environmental variations. Here we present a novel microbial phenomics methodology dubbed Scan-o-matic that overcomes the above hurdles and demonstrate its utility on the eukaryotic model organism baker’s yeast, Saccharomyces cerevisiae.
RESULTS
A novel framework for high-resolution microbial phenomics on solid media
To develop a high-resolution microbial growth phenomics framework for the average microbiology lab, we established a pipeline based on high-quality mass-produced desktop scanning technology (Fig 1B). The scanners allow transmissive light capture, enabling precise estimates of cell counts in microbial colonies (see below). Scans are initiated at pre-programmed intervals by means of a power manager that switches off scanner lamps immediately after scanning. This drastically reduces spatial bias from exposure of colonies closer to the lamp parking position to excessive light and temperature. Four solid media microcultivation plates, each containing from 96 to 1536 colonies, fit in each scanner (Fig 1C). Given initial instructions, Scan-o-matic completes the entire data acquisition to growth feature extraction pipeline autonomously (Fig S1, S2). Plates are fixed within an acrylic glass fixture where their precise positioning is automatically detected using orientation markers (Fig S3). Each fixture contains a transmissive grey-scale calibration strip to compensate for variations in pixel intensities between scans that otherwise represent a major obstacle for robust estimation of colony population size (Fig S4). Scan-o-matic segments images and robustly localizes colonies, avoiding misclassification from dust particles, plastic deformations or scratches (Fig 1D, S5, S6). Colony intensities are compared to local background pixel intensities and converted via a calibration function into cell counts per pixel (Fig S7). These are summed over the colony to produce precise estimates of colony population size. High frequency of estimates translates into well-resolved growth curves that are quality controlled with a semi-automated user interface (Fig S8).
A) Overview of the Scan-o-matic hardware – software arrangement and interaction. Three Epson Perfection V700 PHOTO scanners (Epson Corporation, UK) are connected via USB to each controlling computer. The power supply of each scanner is controlled individually and independently from the computer using a single GEMBIRD EnerGenie PowerManager LAN with multiple sockets (Gembird Ltd, the Netherlands). Software structure is shown inside the blue rectangle. Experiments on the different scanners connected to one computer are run as separate autonomous processes, but scanner power supply toggling is coordinated by the server. This ensures that malfunction in one scanner does not affect other scanners connected to the same computer. B) Overview of the first pass analysis. The first pass analysis identifies the position of orientation markers within each image and matches recorded positions to positions in a stored model of the used fixture (see Fig S2B and S4A). The match is used to precisely determine positions of each of the four plates and of the transmissive scale calibration strip in the respective image (see Fig S3B). These are sectioned out and stored as separate image features. The transmissive scale is further analysed to establish positions of transmissive scale areas (Fig S4A) and used to calibrate pixel intensities so that these become independent of variations in scanner properties over time, space and scanners (Fig S4B). Observe that the first pass analysis is performed as images are acquired, image per image, and is completed, reported and stored rapidly after acquisition of each image. The experiment processes rests until next scan is expected. C) Overview of second pass analysis. Second pass analysis detects colonies and quantifies their cell density, in each image. Second pass analysis is performed in a time reverse manner, i.e. starting from the last image in each time series, after all images have been acquired. First, a grid corresponding to the pinning density used is placed on top of each plate image and positions of grid intersections are precisely aligned with the centre point of candidate colonies (Fig S5). Second, each grid intersection region is segmented so that colonies are distinguished from background – pixels are assigned to colonies and local backgrounds respectively (Fig S6). Pixel intensity differences between the pixels of each colony and its local background average pixel value are converted into cell density by calibration to empirical measures (Fig S7). The complete process is iterated for all 1536 colonies, in each of the four plates, for each of the images (145 for a 48h experiment with 20 min measurement intervals), starting from the last image.
A) User interface for initiating a Scan-o-matic experiment. Parameters that are set are: experiment duration (from 14 minutes to 7 days, but typically 2–3 days), time interval between consecutive scans (from 7 to 180 minutes but typically 20 minutes) and the pinning density of each plate (96, 384, 1536, 6144). The four plates in each scanner will by necessity be subject to the same time and interval settings. The experiment is initiated from the interface and will run until completion. B) Scan-o-matic fixture calibration user interface. Before experiment, once per scanner-fixture combination, a spatial calibration model that describes the layout of the fixture needs to be created using this interface. Scan-o-matic uses this fixture calibration model to section images of said fixture into their meaningful components. The fixture name and the number of fixture orientation markers are given. Plate positions, and the position of the transmissive scale calibration target strip are marked by interactive click-and-drag mouse maneuvers. The type of calibration target is selectable from a drop down menu.
A) Localizing orientation markers within each individual image. Each image, I, is thresholded with a pixel value of 127 (value range midpoint) resulting in a two colour image (dark and light blue). The thresholded image is convolved, using a fast Fourier transformation, with a black and white representation of a fixture orientation marker (upper right corner). This evaluates, for each pixel of the image, how well the local context of that pixel matches a representation of a fixture orientation marker. Iteratively, with the number of iterations corresponding to the number of fixture orientation markers in the fixture model, the strongest signal position is selected and chosen as the centre position (red) of that fixture orientation marker, c. c is stored as a marker coordinate. A strong signal is obtained when the majority of the adjacent pixels correspond to the expected marker two-colour pattern. To avoid repeated reporting of the same marker the local area, corresponding to the size of the convolution kernel, is set to one less than the minimum of the convolution surface. The process of marker localization is repeated on the modified surface until all markers have been found. The mass centre of all three markers (red star) is calculated to serve as a fixed reference point. B) Localizing and sectioning out plates and transmissive scale calibration strips. Orientation marker coordinates are matched between image and fixture calibration model and the offset vector between the two-dimensional centres of mass of the two coordinate systems (image and fixture calibration model) obtained. The image’s instance of the fixture model is adjusted with this offset vector so that a near perfect match between image and model sections is obtained. Each relevant image feature (plates and calibration target areas) is represented by four vectors (dark blue and light blue arrows) from the fixed reference point (two-dimensional mass centre, red star). Each relevant image feature is sectioned out and stored separately for later analysis (see Fig S4–8)
A) The stored image section corresponding to a transmissive scale is sequentially trimmed, first in an orthogonal and then in a parallel fashion, to exclude all pixels not belonging to a transmissive scale proper (red rectangle). Transmissive scale segment centres are localized (red circles). The median of the pixel intensities around each segment centre is taken as the best representation of pixel intensities in that segment. B) Supplier provided calibration target values for each segment is available (x-axis). Together with the detected transmissive scale segments’ pixel intensities (y-axis), these are used to construct a third degree polynomial function that is specific for each transmissive scale calibration target strip and image. The polynomial translates observed pixel intensities to a normalized value space of calibrated pixel intensities. These are used for all further analysis (see Fig S5).
The first step in the second pass analysis places a virtual grid that matches the user supplied pinning matrix over the raw image of each plate and aligns grid intersections with the centres of candidate colonies. A) Raw image of a plate. B) Otsu-thresholded surface of a plate section that corresponds to a zoom-in on a candidate colony on the plate in A. Each image is first partitioned into sections by a randomly seeded Voronoi diagram. Each section is then Otsu-thresholded. For each of these sections, the pixel intensities of background pixels form one Gaussian distribution and the pixel intensities of colony pixels form another Gaussian distribution. The Otsu-threshold defines the optimal intensity for each mixed Gaussian distribution to separate the component distributions into pixels considered as candidate colony (blob) pixels and background pixels. For all the sections in the plate, the surface of threshold-values is heavily smoothed. C) Thresholds are invoked to call blob pixels (red) as distinct from local background pixels (blue). Note the thin red line of false positives at the bottom of the image. D) After applying the Otsu-threshold, aggregations of blob pixels are simplified and smoothed, using iterations of binary erode and binary propagation. This re-assigns some pixels at the blob/background boundary. E) All blobs are evaluated for size (>40 pixels in area size and cover fewer pixels than the square of the expected distance between colonies). Note that this removes the thin line of false positives at the bottom of the image. F) All remaining blobs are evaluated for shape (expected to be circular due to the shape of the pin head that places the cells on the plate and expected to approximate evenly radiating growth; bounding box is expected to be roughly square). Note the removal (blue rectangle) of a blob that could not unambiguously be established as a true colony. G) The average empirical spacing between pins in the used pin format is used to construct an idealized grid. This idealized grid assumes that all pin centres, and consequently the centre of all deposited colonies, are an average distance apart. The idealized grid is fitted to the stored blob array such that the sum of errors between grid intersections in the idealized grid and the centre positions is minimized. Grid intersections of the idealized grid are finally aligned with the nearest blob mass centre, given that the nearest blob is close enough (squared distance must be less than a heuristically set threshold of 105 pixels). H) Resolved view of a placed grid. Grid intersections (white circles) and blob centres (light blue crosses) are indicated.
A) Raw image of a random colony and its local background. B) Distribution of pixel intensities at the grid intersection shown in A. Scan-o-matic assumes that colony pixel intensities and local background pixel intensities follow two overlapping distributions. Pixels are assigned to colony and local background respectively based on their pixel intensity, as described below. C) Pixel intensities in three local areas (centred on three grid intersections) of an image. Dark red = high intensity, dark blue = low intensity. Areas reflect three complex challenges: hair (top panels), specks of dust (centre panels), and distortions in the transparency of the plastic casting (lower panel). The landscape of raw pixel intensities (far left panels) is first smoothed with a median filter (size 3×3 pixels) to reduce noise (left panels). Note that this vastly but not completely, removes the impacts of the challenging elements. Smoothed pixel intensities are then thresholded with an Otsu-threshold, i.e. assigned as candidate colony/blob pixels if they pass a local threshold. To accurately define blobs and ensure that the edge of each colony is included in the colony’s definition, blobs are passed through iterations of binary dilation. This re-assigns some pixels at the boundary of each blob. Blobs falling inside, or partially inside, other blobs, are merged. The largest, most circular blob that, if it’s not the first image to be analysed, best concurs with the blob of the previous iteration (image, analysed from end to start), is designated as a colony. Other blobs are placed in a second array as trashed pixels. Trashed pixels correspond to specks of dust, scratches on the plastic, or other similar types of noise and are not considered further. Note that this completely removes any remaining influence of the challenging elements encountered (right panels, blob pixels in red). The local background (far right panels, background pixels in blue) is then defined as the complement of the union of the blob and the trash arrays, with an eroded safety margin. The mean of the interquartile range (IQR) of the local background is used as a scalar measure of the transparency of the solid medium. The difference between the pixel values of the blob and the local background value represents the per pixel darkening effect of the cells in the corresponding area of the plate. These values are retained for downstream use (see Fig S7).
The transformation from cell darkening values to actual cell densities is based on a calibration experiment in which 42 colonies with a wide range of sizes were analysed in Scan-o-matic, according to the procedures described above. Solid media (agar) pieces with each colony on top were then carefully removed from the plate, colonies were washed off from the agar into sterile water by extensive vortexing and the actual number of cells in each colony was estimated using two independent techniques. First, OD600 in a spectrometer (Pharmacia Biotech NovaspecII) was measured and transformed into cell density based on that 1 mL of OD = 1.00 medium corresponds to 107 cells. Second, cells were sonicated and counted using a Fluorescence Activated Sorting Machine (FACS) (FACS; BD FACSAria). A) The two empirical cell density measures (OD and FACS) showed close to perfect linear correlation, ensuring that they accurately capture, or at least scales linearly with, real cell numbers in each colony. B) OD-measures were used to transform pixel darkening values (calibrated pixel intensity) to cell density, assuming a polynomial relation. The best fit polynomial was used: y = 3.38*10−5*x5 + 49.0x (coefficients rounded for display purpose; higher precision was used in actual computation), for x > 0, where y = cell density and x = pixel darkening values. Intermediate exponents are omitted to avoid over-fitting the curve. Pixels with x < 0, are set to x = 0 before applying the polynomial. C) To verify the accuracy of polynomial transformation as a concept, 32 of the 42 colonies were randomly selected and a new polynomial constructed from them. The remaining 10 colonies were used to estimate the cell density as explained above and compared to the measured densities using the new polynomial. Close to perfect 1:1 correlation was observed.
To give a visual representation of the distribution of phenotypes over a plate, we designed a user interface that shows each plate as a two-dimensional heatmap. The coordinates reflect colony positions and the colour intensity reflect the phenotype value. The phenotype to display is selected from a drop-down menu. In the lower part of the display, the currently selected colony’s growth curve is displayed together with phenotype information. The plate selector allows toggling between the four plates of the fixture, one at a time. The phenotype drop down selector allows selection of which phenotype to display in the heatmap format. The key output phenotype is the generation time/population size doubling time. Other phenotypes are: “Error of GT”, i.e. the error of the linear regression at the time of the population size doubling time extraction, the “Fit of the modified Chapman-Richards Model” and “time at maximal growth rate”. A large error of the linear regression, a poor model fit and a very early or very late extraction of population size doubling times suggest growth curves of low quality. Such suspected low quality growth curves can and should be manually inspected by selecting individual positions on the heatmap. Scan-o-matic will automatically display the colonies in ascending quality order to ease inspection. Low quality growth curves should be discarded before normalization, not to influence the normalization procedure. We urge users to be conservative in retaining growth curves, in particular if replication is high.
Cultivation over a range of environmental contexts shows the vast majority (>99.5%) of growth curves to have the classical sigmoid shape, to be minimally affected by noise and systematic artefacts, and to have distinct environment specific properties (Fig 1E). Following a lag phase of variable length, early growth was found to be near exponential, and in unstressed environments it corresponded to a raw population size doubling time of 2.0h (Coefficient of Variation, CV = 4.9%, n = 1536). Carbon limited populations growing on simple sugar sources reached distinct stationary phases when the carbon was depleted. Curve shapes were independent of pinning format (384 or 1536 colonies) (Fig S9A). Final cell number increased at lower colony pinning density presumably due to reduced competition for carbon and energy. However, initial population size was smaller in the denser formats as the smaller pin-heads deposited fewer cells. The net effect was that the 1536 format provided a longer growth span than the 384 format (mean of 5.1 vs. 4.9 doublings, p<10−8), faster growth and more stable estimates of the growth rate (Fig S9B).
Genetically identical (WT, BY4741) colonies were pinned with either 1536 or 384 pins onto four identical no stress plates (synthetic defined media) with either 1536 (n=2 plates) or 384 pins (n=2 plates). Note that variation as a fraction of the signal, or CV, was much smaller for the 1536 (mean CV of 4.9% for 1536 vs. 8.2% for 384) A) Random growth curves from 1536 (n=10) and 384 pinned plates respectively (n=10). B) Mean population doubling time for each plate. Error bars = SEM (n = 1536 or 384).
The trade-off between image resolution and image acquisition time currently makes high-quality data beyond the 1536 format unattainable. Each scanner handles four plates and each computer controls three scanners; thus, even our prototype arrangement with five computers and 15 scanners is capable of running more than 92,000 individual growth experiments in parallel, using the 1536 format. This throughput is orders of magnitude better than what can be achieved by liquid microcultivation.
Enhanced measurement precision and accuracy of colony population size and growth rate
Highly time-resolved growth data offers decisive conceptual advantages since distinctive physiological states can be analysed (Fig 1A). To test whether our approach also offers technical advancements, we compared its precision and accuracy to that of the current microbial phenomics standard on solid media: measures of 2-dimensional area covered by the colony at a single point in time. When considering 1536 pinning format plates containing genetically identical cultures (wildtype, WT), colony population size growth curves contained less random noise than their colony area counter parts (Fig 2A). Using the standard error of the regression at the time of maximal growth as a measure of random curve noise, the population size growth curves were roughly two-fold as robust as the colony area growth curves (Fig 2B). This rather drastic reduction in random noise was expected, because the 2-dimensional colony area measures per definition contain less and have less resolved information. This difference in information content increases as growth proceeds to reach a maximum at stationary phase entry. Here, colony growth, measured as image area covered by the colony, corresponded to a single doubling on average, while colony population size typically corresponded to more than four doublings (Fig 2A). Thus, the critical measure of precision, random noise as a fraction of signal strength (CV), vastly favours colony population size growth curves.
A-B) Comparing random noise in growth curves based on either colony area or colony population size. A) Growth curves of two sample colonies, labelled 1 and 2. Y-axes are on log(2) scale. B) Estimating growth curve noise in the critical section of the curve when growth is maximal. Noise was measured as the standard error of the regression corresponding to the highest slope. Mean of 1536 genetically identical WT growth curves in an unstressed environment is shown. Error bars = SEM. C-D) Accuracy (sum of random noise and systematic bias) over a plate, measured as the coefficient of variation across 1536 genetically identical WT colonies. C) Accuracy as a function of time, for colony area size and colony population size growth rate. A single plate of unstressed populations is depicted. Y-axis is on log(2) scale. D) Accuracy for single measure of colony area at end of growth (48h), colony area growth rate and colony population size growth rate. The mean of four plates with different stresses (2% glucose and 2% galactose, with and without 0.85M NaCl) is shown. Error bars = SEM. E) Left panel: Visual representation of the edge effect for colony area at end of growth (48h). Each square corresponds to one of 1536 genetically identical (WT) colonies grown in absence of stress. Colour intensity shows colony area, dark blue = 500 and red = 1800 pixels. Bold squares indicate colonies highlighted in the right panel. Right panel: Population size growth curves for colonies indicated in the left panel. Curve colour matches the colour of the highlighted squares in the left panel. Time points of maximal growth rate and for 48h measures are indicated (broken lines). F) Raw image of a corner section of a 1536 plate with genetically identical (WT) colonies growing in absence of stress, at 48h and at the time of maximal growth rate (5h).
Precision provides an incomplete picture of technical achievement by overlooking systematic bias. Consideration of the total measurement error for a plate as the variance over genetically identical colonies divided by their mean showed accuracy to shift dramatically depending on growth state (Fig 2C). Accuracy was initially low for both area and population size, primarily due to large variation in the robotic delivery of cells, but also due to larger influence of randomness when pixels representing each colony were few (initially ~100). The total measurement error for colony area on a plate steadily decreased to reach a minimum late in the growth phase. Thereafter it rapidly increased as competition intensified, favouring growth of outer frame colonies with fewer neighbours. Variation in growth rates obtained from colony population size began similarly high, but dropped dramatically to reach a very low minimum well below single point measures in early exponential growth phase, when growth rates are maximal. We therefore extracted maximal growth rates as population size doubling time estimates as the most stable, key feature of growth curves.
Considering six diverse environments, population size doubling times were estimated with about twice the accuracy of single measures of colony area at the end of growth (Fig 2D). This increase in accuracy is partly due to the lack of competition between colonies at the time at which growth is maximal and thus lower spatial bias (Fig 2E, F). The avoidance of colony competition effects represents not only a quantitative but also a qualitative advancement by averting confounding comparisons of strains in different physiological states (Fig 1A).
Comprehensive removal of spatial bias by reference-surface normalisation
Assuming the common, but naïve, null hypothesis approach that all systematic bias has been removed by the experimental design, we would expect ≈ 5% false positives for each of the above single genotype (WT) tests (α = 0.05, Student’s t-tests). However, even after extensive technical optimization and using the most robust measure of growth (population size doubling times) to minimize error, we scored many more (~10x) false positives than chance expectation (Fig 3A). This suggested substantial systematic bias to remain. The change in error-to-signal ratio over time (Fig 2C) indicated two distinct types of errors to dominate population size doubling time estimates: an initial bias from the number of cells deposited on plates that decreases in magnitude as colony population size increases, and a later, increasing bias that derives from increasing competition between neighbouring colonies (Fig 2E). Of these, the former is the most important, as reflected in a strong correlation between population size doubling times and initial population size (Fig 3B). However, bias from initial population size only explained about half of the error, as shown by the remaining false positives after normalizing the population size doubling times to the initially deposited number of cells (Fig 3A).
A) Fraction of false positives due to spatial bias within plates with genetically identical colonies (WT). Each plate corresponds to one distinct environmental challenge. On each plate, population size doubling times of immediately adjacent colonies (excluding every 4th position that was used a control position) were statistically compared to those of non-adjacent colonies using a one-sample Students t-test (H0 = zero difference, α=0.05). Assuming all variation to be random, i.e. no spatial bias, the random expectation is 5% false positives (broken line) at this significance cut-off. Any excess of false positives corresponds to spatial bias. Pink bars = before normalization, light blue bars = after normalization to initial population size, dark blue bars = after reference grid normalization. “All” indicates the mean of false positives over all six plates with error bars = SEM. B) Population size doubling time as a function of initial population size. Left panel = before normalization, right panel = after reference grid normalization. All individual estimates over four of the six genetically homogeneous 1536 plates with different environmental challenges (2% glucose and 2% galactose, with and without 0.85M NaCl) are shown. C) Spatial bias is removed by reference grid normalization. Genetically identical reference colonies are pinned into every fourth colony position (lower right position in every tetrad of positions), creating a matrix of 384 control colonies on which a normalization surface of population doubling times is based. The local normalization surface is subtracted from each observation. Upper panel = distribution of population size doubling times of 1536 genetically identical colonies across a plate, before normalization. Each square corresponds to a colony position. Colour indicates population size doubling time. Lower panel: As upper panel, but colour represents population size doubling times after reference grid normalization. Normalization was achieved by designating each 4th position a control position. See also Fig S10. D) Frequency distribution of population size doubling times, before and after reference grid normalization, in a sample plate (blue experiments in B, 2% galactose + 0.85M NaCl. E) Box plot showing variation in population doubling time estimates (y-axis, CV between adjacent colonies) after normalization within each of the six genetically identical but environmentally distinct experiments in A, as a function of plate mean population doubling times (x-axis). Red line = median CV for all groups of adjacent colonies on plate, box = inter quartile range (mid 50%) of CVs, whiskers = complete range of CVs.
Spatial bias is removed by reference grid normalization. Genetically identical reference colonies are pinned into every fourth colony position, creating a matrix of 384 control colonies on which a normalization surface of population doubling times is based. The local normalization surface is subtracted from each observation. Left panel = distribution of population size doubling times of 1536 genetically identical colonies across a plate, before normalization. Each square corresponds to a colony position. Right panel: As left panel, but color represents population size doubling times after reference grid normalization. Normalization was achieved by designating every 4th position as a control position. Plates correspond to distinct environments, but all experiments are genetically identical, thus an even surface is expected. Color indicates population size doubling time with red = slow growth and blue = fast growth. Color scales are linear but ranges vary between plates to maximize resolution. Ranges are indicated to the right.
Topology and strength of the spatial bias varied dramatically between plates and environments to produce complex patterns (Fig 3C, S10 - before panels). The major tendency in the bias was a positive correlation between neighbours that directly contrasted with the competition-for-resources model believed to explain bias at later stages. To meet this challenge and to stringently normalize for complex and unpredictable spatial bias of unknown origin, we replaced every fourth position on the plates with isogenic controls, creating an array of 384 reference positions (Fig S11). This defines an evenly spaced 2-dimensional array of control colony growth rates that by interpolation reflects local spatial variations over plates with high fidelity. Subtraction of the spatial control-surface of growth rates from actual estimates provided position-normalized measures. This normalization removed the correlation of population doubling time to initial population size (Fig 3B), and it considerably reduced variation across plates (Fig 3C, S10). Critically, it also resulted in a false positive rate that approached random expectations by being an order of magnitude lower than the one obtained without spatial normalization (Fig 3A). Spatially normalized population doubling times were approximately normally distributed, a fundamental requirement for application of standard parametric statistics (Fig 3D). Spatial normalization also completely removed the correlations between relative error (CV) and signal strength (population doubling times) that otherwise would complicate statistical treatment (Fig 3E). After spatial normalization the total measurement error in an unstressed environment was around 2% of the signal, approaching the 1.5% achieved with state-of-the-art microcultivation in liquid media (Warringer et al. 2003; Warringer et al. 2011). Thus, although some spatial bias that is not fully captured by neighbouring controls does remain, this reference-surface normalization protocol permits a measurement accuracy that almost matches the best that can be achieved with lower throughput approaches.
Strains from two source plates, plate A containing intended experiments and plate B containing intended identical reference strains in all positions, are successively transferred to one target plate, C. Pinning is iterated x4 (i - iv) resulting in a 3:1 ratio of experiments relative controls on the target plate. Observe that plate C should not be used directly as experiment plate because the spatial bias of control colony growth on this plate poorly reflects the spatial bias of the growth of experiments. Instead, plate C should be used as a pre-culture for the real experimental plate.
Scan-o-matic extends and nuances the salt biology of baker’s yeast
Gene-by-salt interactions have previously been called by scoring NaCl-sensitive gene deletion strains with state-of-the art microcultivation, and the underlying biology extensively detailed (Warringer et al. 2003). To evaluate the recapitulation of established knowledge, we compared our new method with previous microcultivation experimental calls of salt-sensitive mutants in the complete yeast gene deletion collection.
Normalising to growth defects in absence of any stress (Fig 4A), we identified a large number of salt-sensitive gene deletions. Many of these, such as those encoding Hog1 and Pbs1 that activates the osmo-response (Hohmann 2002) or the transcription factor Crz1, corresponded to proteins well known to control salt tolerance and exhibited growth curves on solid and liquid media that were very similar (Fig 4A, B). Among the salt-sensitive mutants identified in this genome-wide screen, 14 biological processes were enriched. Many of these, e.g. ion homeostasis, ion transport, response to osmotic stress, and endocytosis and vacuolar transport (Fig 4C), have previously been shown to be of importance during salt exposure (Warringer et al. 2003).
The haploid MATa yeast deletion collection was cultivated in 2% glucose, with and without 0.85M NaCl. Log(2) population doubling times relative the control surface of WT controls were extracted. Negative values represent growth defects. A) Upper panel: Comparing relative population size doubling times of the yeast deletion collection in presence and absence of NaCl. Lower panel: Colony population doubling times in NaCl were normalized to corresponding measures in absence of NaCl, estimating NaCl-specific growth effects. These were plotted as a function of relative population doubling times in absence of NaCl. Little correlation remains. B) Growth dynamics of three sample deletion strains (n=2) with NaCl specific growth defects, in Scan-o-matic and during liquid microcultivation. C) Functions enriched (Fisher’s exact test, FDR q<0.05) among top 100, 200 and 400 most salt sensitive deletion strains in Scan-o-matic. Cut-offs approximately correspond to relative growth defects larger than −0.27, −020 and −0.15. D) Frequency distributions of salt-specific deletion strain growth effects, obtained by solid substrate cultivation in Scan-o-matic and by liquid microcultivation in a Bioscreen C (Warringer and Blomberg 2003) E-F) A subset of 70 deletion strains were re-cultivated in absence and presence of 0.85M NaCl at high replication, using Scan-o-matic (solid; n=24) and liquid (n=6) microcultivation, respectively. Re-cultivations were performed in parallel, removing all conceivable systematic variation beside cultivation method. E) Salt-specific growth defects in Scan-o-matic and liquid microcultivation regimes. Regression (black, Pearson R2 is indicated) and 1:1 lines (red) are shown. F) Deletion strains were ranked based on salt-specific growth effects during solid substrate cultivation and salt specific growth effects were plotted. Error bars = SEM.
However, the distribution of gene-by-salt interactions was substantially wider using our new set of protocols, with a distinct shoulder towards salt sensitivity (Fig 4D). Thus, at any given signal strength and at any given significance stringency, more interactions were identified on agar compared to liquid microcultivation. The amplification of signal strengths of traits was evident also with regard to growth in absence of stress, suggesting the increase of phenotypic variation on solid media to be a general phenomenon (Fig S12A). Notably, we also found that genes that were required for salt tolerance during liquid microcultivation were in many cases irrelevant for salt tolerance when scored on agar (data not shown). This suggests that differences in salt-specific genotype-by-environment interactions were due to different demands on cells dispersed in a solution compared to cells fixed in a colony structure.
The haploid MATa yeast deletion collection was cultivated in Scan-o-matic in absence of stress. Log2 population doubling times relative the control surface of WT controls were extracted. Negative values represent growth defects. A) Frequency distributions of salt-specific deletion strain growth effects, obtained by solid substrate cultivation in Scan-o-matic and by liquid microcultivation in a Bioscreen C. B-C) A subset of 70 deletion strains were re-cultivated in absence of stress at high replication, using Scan-o-matic (solid; n=24) and liquid (n=6) microcultivation respectively. Re-cultivations were performed in parallel, removing all conceivable systematic variation beside cultivation method. B) Growth effects of gene deletions in solid (Scan-o-matic) and liquid microcultivation. Regression (black, Pearson R2 is indicated) and 1:1 lines (red) are shown. C) Gene deletion strains were ranked based on growth effects during solid substrate cultivation and growth effects were plotted. Error bars = SEM.
To exclude batch and temporal bias and stringently test the influence of cultivation method on the gene-by-salt interactions, we re-screened the same subset of genotypes with both cultivation methods using identical media, random positioning and parallel screening. We found correlation between cultivation methods to be only intermediate (r2 = 0.5 - 0.6), both in the presence and absence of salt (Fig 4E, S12B, S12C). Roughly 10% of the phenotypic variation could be explained by technical noise, meaning that the remaining 30–40% derived from phenotypic effects imposed by cultivation method differences.
Both cultivation methods called the most important salt-defence regulators, like signalling components Hog1 and Pbs1, Rvs161 and Rvs167 that reorganizes the action cytoskeleton during Na+ stress (Balguerie et al. 2002; Lombardi and Riezman 2001), the regulatory subunits Ckb1 and Ckb2 of the casein kinase that regulates Na+ extrusion (Glover 1998), the regulatory subunit of calcineurin Cnb1 plus its downstream transcription factor Crz1 that controls Na+ efflux at current pH (Stathopoulos and Cyert 1997), and the cation extrusion regulator Sat4 (Mulet et al. 1999) (Fig 4E). However, most mutants were salt sensitive in only a single cultivation regime. Thus, despite an abundance of amino acids in the media, the absence of either Aro2 or Aro7, required for synthesis of aromatic amino acids, was central to salt tolerance only during colony agar-growth, as were the removal of the RNA pol II component Med1 or the inhibitor of pseudohyphae formation, Sok2 (Fig 4F). In contrast, presence of the clathrin coated vesicle component Apl4 and the actin cytoskeleton associated Sla1 were critical to salt tolerance only for cells reproducing dispersed in a solution. Overall, 81% of re-tested gene deletions differed significantly (Student’s t-test, FDR q<1%) in salt tolerance between solid media and microcultivation growth, the majority (68%) of phenotypes being amplified on solid media. Thus, whereas our new approach recapitulated the essence of established salt biology wisdom, it also highlighted the large influence of cultivation approach on both the quantitative and qualitative aspects of phenotypes.
DISCUSSION
Scan-o-matic as currently implemented drastically improves and sets a new standard in microbial growth phenomics but does not exhaust opportunities for future improvements.
Further reductions in random noise would require changing from 8- to16-bit image depth, increased scanning frequency or enhanced image resolution, all of which are associated with trade-offs. 16-bit image depth increases file size, challenging the logistics of data storage and analyses, and is poorly supported. Scanning frequency and image resolution are in a direct quality trade-off as enhanced image resolution reduces scanning speed and thereby increases exposure to radiation, heat and dehydration from light. This is a serious concern because exposure to intense light severely impedes growth (Logg et al. 2009), however, the light-sensitive hog1 and pbs2 mutants where not affected by the scanning frequency currently employed (data not shown).
Reductions in bias require meticulous attention to experimental design or to posterior correction procedures. Early spatial bias derives primarily from systematic variations in numbers or physiological states of cells deposited, the latter emerging due to transfer of metabolically distinct central or peripheral members of pre-culture colonies. To ensure that the normalization surface compensates for such bias, controls and experiments should originate from the same pre-culture plate. Late spatial bias derives from variations in nutrient access and toxin exposure that depend on media thickness and homogeneity as well as number and size of metabolically active colonies in the local environment. Standardization of plate casting procedures are thus essential as are media buffering because cell secretion of acidic metabolites otherwise drives patch-wise pH variations (Fig S13) affecting local growth. Earlier published posterior correction procedures have used spatial variations in the performance of experimental colonies themselves and assumed that spatial bias mainly manifests as an edge effect that can be compensated for in a row/column wise manner (Baryshnikova et al. 2010). Adoption of this approach as an additional level of spatial bias normalization is unlikely to improve accuracy, first because use of experimental data both as correction input and as final output creates unsound statistical dependencies and artefacts and second because the edge effect is not a major driver of bias at the time of maximal growth rates. Due to its dramatic nature, with plateaus of low and high bias respectively and their separation by sharp fault lines, comprehensive posterior removal of the remaining spatial bias is extremely challenging.
Cells secrete organic acids metabolic by-products of carbon metabolism, lowering the pH of the local environment. Secreted acids diffuse slowly through the media, creating systematic spatial variations in pH as a function of colony size and colony metabolic state. External pH affects growth through altering a wide range of molecular phenotypes. Figure shows a time resolved view of pH change across a plate as a function of time. A plate was cast with unbuffered SC medium (initial pH = 6.0) supplemented with a pH indicator (1mg/50mL bromocresol green) and seeded with genetically identical BY4741 colonies (his3Δ::kanMX4) at uneven initial population sizes. Intense yellow = pH below 3.8, intense blue = pH above 5.4.
Condensation of population growth curves into estimates of maximal growth rates makes poor use of the vast trove of accumulated data. Time before net growth commences (growth lag) and the total gain in population size (growth efficiency), are additional fitness components (Cooper 1991) capable of driving adaptation (Ibstedt et al. 2015). Higher noise in the beginning and end of growth means that more attention to the experimental design and analysis procedure is needed before robustness can be achieved also for these fitness components. Beyond lag and efficiency, colonies rarely maintain maximal growth rates over extended periods of time. In addition, growth often becomes multiphasic (Warringer et al. 2008). A single growth rate estimate, maximal growth rate, will thus not fully capture the true dynamics of growth. Future more exhaustive use of growth information will help ensure that the introduced Scan-o-matic framework provides truly high-quality microbial phenomics data for extensive cohorts of individuals to generate well-populated, highly resolved and standardized databases.
MATERIALS AND METHODS
Physical arrangement of Scan-o-matic
Scan-o-matic is based on high quality mass-produced desktop scanners (Fig 1A) that are controlled by power-managers (Fig S1A). Images are acquired using SANE (Scanner Access Now Easy) (Mosberger 1998) using transmissive scanning at 600 dpi, 8-bit grey-scale and a scan area extension that captures four plates per image (Fig 1B). Plates are fixed in custom-made acrylic glass fixtures with orientation markers ensuring software recognition of fixture position (Fig 1B). Fixtures are calibrated to scanner by a fixture calibration model (Fig S2B, S4). Pixel intensities are standardized across instruments using transmissive grey scale calibration targets (Fig S4). Scanners are maintained in a 30°C, high humidity environment and kept covered.
Scan-o-matic software, image acquisition and analysis
Scan-o-matic is written in Python 2.7 and can be installed from https://github.com/local-minimum/scanomatic/wiki. Matplotlib is used for graph production. Numpy, Scipy and Scikits-Image is used for computation and analysis (van der Walt et al. 2011; van der Walt et al. 2014). Experiments are initiated from a web-interface (Fig S2A), 7 minutes being the minimum time interval between scans (20 min as default) and 96, 384 or 1536 pinned plates as allowed formats. Each series of scans is analysed in a two-pass process. The first-pass is performed during image acquisition (Fig S1B). Using a fixture calibration model (Fig S2B) and orientation markers (Fig S3A), the plate and transmissive grey scale calibration strip positions are identified (Fig S3B). The transmissive grey scale calibration strip area is trimmed (Fig S4A) and segment pixel intensities are compared to manufacturer’s supplied values (Fig S4B). In the second-pass analysis (Fig S1C), a virtual grid is first established across each plate so that grid intersections match the centres of likely colonies (Fig S5). At grid intersections, the local area is segmented, colonies defined relative the local background and pixel intensities of both are determined, compared and converted into actual cell number estimates (Fig S6, S7). Colony image areas were determined as the number of pixels included in each colony definition.
Extracting, evaluating and normalizing population growth rates from smoothed growth curves
Raw growth curves are smoothed, first using a median filter that removes local spikes and then using Gaussian filter that reduces remaining noise. Initial colony population size, maximal population size doubling time, time of extraction of the maximal population size doubling time, error of the linear regression that underlies extraction of the maximal doubling time, and the growth curve fit to an initial value extended version of the classical Chapman-Richard model are extracted from smoothed growth curves. The latter three are used to flag poor quality growth curves that are visually inspected for potential rejection (~0.3% were here rejected).
Every 4th experimental position was here reserved for internal isogenic controls. Doubling times of controls were used to establish a reference surface as follows. Controls with extreme values were removed. Remaining control positions were then used to interpolate a normalization surface that was smoothed, first with a median filter to exclude any remaining noisy measurements, and then with a Gaussian smoothing to soften the landscape contours. For each colony, the log(2) difference between the observed doubling time and the doubling time of the normalization surface in that position was calculated. When multiple plates were included in an experimental series (Fig 4), we first normalized for between-plates bias by shifting the mean of the non-spatially normalized controls on each plate to match the mean over all plates. To call gene-by-salt interactions (Fig 4) we normalized growth rates in NaCl by subtracting growth rates in a no stress environment.
Wet-lab experimental procedure
Solid media plates were cast with 50mL of Synthetic Complete (SC) agar medium buffered to pH 5.8 with 2% glucose or 2% galactose as carbon source and with and without 0.85M NaCl. Two strain layouts were used: 1) all colonies being diploid BY4743 reference strain (Brachmann et al. 1998) (Fig 1–3) 2) colonies corresponding to single yeast gene knockouts of the haploid BY4741 deletion collection (Giaever et al. 2002), with WT control colonies interleaved in every fourth position and n=3 replicates of each strain in juxtaposition (Fig S11). For the confirmation experiment (Fig 4E-F, S12B-C) the same procedure was employed, but at high (n=24 replicates). The reference liquid media experiments (Fig 4E-F, S12B-C) were as previously described (Warringer et al. 2011).
ACKNOWLEDGEMENTS
We thank Olle Nerman and Mats Kvarnström for much appreciated statistical and analytical support and Charlie Boone for access to strains. Financial support from the Swedish Research Council (325–2014–6547 and 621–2014–4605) and from the Carl Trygger Foundation (CTS 12:521) to JW, from the Swedish Research Council FORMAS to AB, from the Slovenian Research Agency (P1–207 and L2–1112) to UP and PK and from and LABEX SIGNALIFE (ANR-11-LABX-0028–01) to JH is acknowledged.