ABSTRACT
The capacity to map traits over large cohorts of individuals – phenomics – lags far behind the explosive development in genomics. For microbes the estimation of growth is the key phenotype. We introduce an automated microbial phenomics framework that delivers accurate and highly resolved growth phenotypes at an unprecedented scale. Advancements were achieved through introduction of transmissive scanning hardware and software technology, frequent acquisition of precise colony population size measurements, extraction of population growth rates from growth curves and removal of spatial bias by reference-surface normalization. Our prototype arrangement automatically records and analyses 100,000 experiments in parallel. We demonstrate the power of the approach by extending and nuancing the known salt defence biology in baker’s yeast. The introduced framework will have a transformative impact by providing high-quality microbial phenomics data for extensive cohorts of individuals and generating well-populated and standardized phenomics databases.
INTRODUCTION
While our ability to detect genetic variation has improved tremendously (Koboldt et al. 2013; Mardis 2013), our capacity to rapidly and accurately map the phenotypic effects of this variation in large population cohorts – phenomics – has not made comparable gains (Houle et al. 2010). This is true also for microbes, where estimation of fitness components defines the key entrance point when searching for the concerted effects of causative genetic variation. As the majority of genetic or environmental effects on net fitness are modest in size (Thatcher et al. 1998; Wagner 2000), this demands access to measurement methodology capable of capturing subtle differences.
Micro-colony analysis estimates within-population variations (Levy et al. 2012) but is low throughput for differences between populations. Miniaturization of liquid cultures allows automation and highly accurate measurement of population density (Warringer and Blomberg 2003, 2014) but its associated costs in time and manpower are too high to allow massive scale-up. Tagging of strains with DNA sequence barcodes and monitoring tag frequencies in complex strain mixtures allows high throughput but at lower accuracy and at high initial investments (Giaever et al. 2002; Winzeler et al. 1999). Better cost-efficiency can be achieved using ordered arrays of microbes that are cultivated on solid media as colonies whose area sizes are estimated using cameras (Costanzo et al. 2010; Kvitek et al. 2008; Bean et al. 2014; Lawless et al. 2010). The set-up is attractive in that microbes are surveyed in a growth mode that better resembles their natural state, but the environment can rarely be maintained constant across plates. This leads to large spatial biases and false results. Spatial bias correction is a formidable challenge (Baryshnikova et al. 2010) because colonies exchange nutrients and toxins with the local environment and affect each other. Accentuating problems, 3-dimensional colony population size is typically approximated by measuring the often poorly correlated 2-dimensional colony area (Pipe and Grimson 2008). Furthermore, growth tends to be estimated from a single or few measures (Tong et al. 2001; Tong et al. 2004). As any particular colony size can be reached via an endless number of very different growth paths (Fig 1A), this unavoidably creates false positives and negatives as well as wrong conclusions.
The near-term goal of microbial phenomics is therefore to achieve the accuracy of liquid microcultivation in a solid media cultivation mode. To reach this goal the 3-dimensional topology of microbial colonies must be captured at high spatial and temporal resolution and reduced into accurate estimates of colony population sizes while exhaustively compensating for local environmental variations. Here we present a novel microbial phenomics methodology dubbed Scan-o-matic that overcomes the above hurdles and demonstrate its utility on the eukaryotic model organism baker’s yeast, Saccharomyces cerevisiae.
RESULTS
A novel framework for high-resolution microbial phenomics on solid media
To develop a high-resolution microbial growth phenomics framework for the average microbiology lab, we established a pipeline based on high-quality mass-produced desktop scanning technology (Fig 1B). The scanners allow transmissive light capture, enabling precise estimates of cell counts in microbial colonies (see below). Scans are initiated at pre-programmed intervals by means of a power manager that switches off scanner lamps immediately after scanning. This drastically reduces spatial bias from exposure of colonies closer to the lamp parking position to excessive light and temperature. Four solid media microcultivation plates, each containing from 96 to 1536 colonies, fit in each scanner (Fig 1C). Given initial instructions, Scan-o-matic completes the entire data acquisition to growth feature extraction pipeline autonomously (Fig S1, S2). Plates are fixed within an acrylic glass fixture where their precise positioning is automatically detected using orientation markers (Fig S3). Each fixture contains a transmissive grey-scale calibration strip to compensate for variations in pixel intensities between scans that otherwise represent a major obstacle for robust estimation of colony population size (Fig S4). Scan-o-matic segments images and robustly localizes colonies, avoiding misclassification from dust particles, plastic deformations or scratches (Fig 1D, S5, S6). Colony intensities are compared to local background pixel intensities and converted via a calibration function into cell counts per pixel (Fig S7). These are summed over the colony to produce precise estimates of colony population size. High frequency of estimates translates into well-resolved growth curves that are quality controlled with a semi-automated user interface (Fig S8).
Cultivation over a range of environmental contexts shows the vast majority (>99.5%) of growth curves to have the classical sigmoid shape, to be minimally affected by noise and systematic artefacts, and to have distinct environment specific properties (Fig 1E). Following a lag phase of variable length, early growth was found to be near exponential, and in unstressed environments it corresponded to a raw population size doubling time of 2.0h (Coefficient of Variation, CV = 4.9%, n = 1536). Carbon limited populations growing on simple sugar sources reached distinct stationary phases when the carbon was depleted. Curve shapes were independent of pinning format (384 or 1536 colonies) (Fig S9A). Final cell number increased at lower colony pinning density presumably due to reduced competition for carbon and energy. However, initial population size was smaller in the denser formats as the smaller pin-heads deposited fewer cells. The net effect was that the 1536 format provided a longer growth span than the 384 format (mean of 5.1 vs. 4.9 doublings, p<10−8), faster growth and more stable estimates of the growth rate (Fig S9B).
The trade-off between image resolution and image acquisition time currently makes high-quality data beyond the 1536 format unattainable. Each scanner handles four plates and each computer controls three scanners; thus, even our prototype arrangement with five computers and 15 scanners is capable of running more than 92,000 individual growth experiments in parallel, using the 1536 format. This throughput is orders of magnitude better than what can be achieved by liquid microcultivation.
Enhanced measurement precision and accuracy of colony population size and growth rate
Highly time-resolved growth data offers decisive conceptual advantages since distinctive physiological states can be analysed (Fig 1A). To test whether our approach also offers technical advancements, we compared its precision and accuracy to that of the current microbial phenomics standard on solid media: measures of 2-dimensional area covered by the colony at a single point in time. When considering 1536 pinning format plates containing genetically identical cultures (wildtype, WT), colony population size growth curves contained less random noise than their colony area counter parts (Fig 2A). Using the standard error of the regression at the time of maximal growth as a measure of random curve noise, the population size growth curves were roughly two-fold as robust as the colony area growth curves (Fig 2B). This rather drastic reduction in random noise was expected, because the 2-dimensional colony area measures per definition contain less and have less resolved information. This difference in information content increases as growth proceeds to reach a maximum at stationary phase entry. Here, colony growth, measured as image area covered by the colony, corresponded to a single doubling on average, while colony population size typically corresponded to more than four doublings (Fig 2A). Thus, the critical measure of precision, random noise as a fraction of signal strength (CV), vastly favours colony population size growth curves.
Precision provides an incomplete picture of technical achievement by overlooking systematic bias. Consideration of the total measurement error for a plate as the variance over genetically identical colonies divided by their mean showed accuracy to shift dramatically depending on growth state (Fig 2C). Accuracy was initially low for both area and population size, primarily due to large variation in the robotic delivery of cells, but also due to larger influence of randomness when pixels representing each colony were few (initially ~100). The total measurement error for colony area on a plate steadily decreased to reach a minimum late in the growth phase. Thereafter it rapidly increased as competition intensified, favouring growth of outer frame colonies with fewer neighbours. Variation in growth rates obtained from colony population size began similarly high, but dropped dramatically to reach a very low minimum well below single point measures in early exponential growth phase, when growth rates are maximal. We therefore extracted maximal growth rates as population size doubling time estimates as the most stable, key feature of growth curves.
Considering six diverse environments, population size doubling times were estimated with about twice the accuracy of single measures of colony area at the end of growth (Fig 2D). This increase in accuracy is partly due to the lack of competition between colonies at the time at which growth is maximal and thus lower spatial bias (Fig 2E, F). The avoidance of colony competition effects represents not only a quantitative but also a qualitative advancement by averting confounding comparisons of strains in different physiological states (Fig 1A).
Comprehensive removal of spatial bias by reference-surface normalisation
Assuming the common, but naïve, null hypothesis approach that all systematic bias has been removed by the experimental design, we would expect ≈ 5% false positives for each of the above single genotype (WT) tests (α = 0.05, Student’s t-tests). However, even after extensive technical optimization and using the most robust measure of growth (population size doubling times) to minimize error, we scored many more (~10x) false positives than chance expectation (Fig 3A). This suggested substantial systematic bias to remain. The change in error-to-signal ratio over time (Fig 2C) indicated two distinct types of errors to dominate population size doubling time estimates: an initial bias from the number of cells deposited on plates that decreases in magnitude as colony population size increases, and a later, increasing bias that derives from increasing competition between neighbouring colonies (Fig 2E). Of these, the former is the most important, as reflected in a strong correlation between population size doubling times and initial population size (Fig 3B). However, bias from initial population size only explained about half of the error, as shown by the remaining false positives after normalizing the population size doubling times to the initially deposited number of cells (Fig 3A).
Topology and strength of the spatial bias varied dramatically between plates and environments to produce complex patterns (Fig 3C, S10 - before panels). The major tendency in the bias was a positive correlation between neighbours that directly contrasted with the competition-for-resources model believed to explain bias at later stages. To meet this challenge and to stringently normalize for complex and unpredictable spatial bias of unknown origin, we replaced every fourth position on the plates with isogenic controls, creating an array of 384 reference positions (Fig S11). This defines an evenly spaced 2-dimensional array of control colony growth rates that by interpolation reflects local spatial variations over plates with high fidelity. Subtraction of the spatial control-surface of growth rates from actual estimates provided position-normalized measures. This normalization removed the correlation of population doubling time to initial population size (Fig 3B), and it considerably reduced variation across plates (Fig 3C, S10). Critically, it also resulted in a false positive rate that approached random expectations by being an order of magnitude lower than the one obtained without spatial normalization (Fig 3A). Spatially normalized population doubling times were approximately normally distributed, a fundamental requirement for application of standard parametric statistics (Fig 3D). Spatial normalization also completely removed the correlations between relative error (CV) and signal strength (population doubling times) that otherwise would complicate statistical treatment (Fig 3E). After spatial normalization the total measurement error in an unstressed environment was around 2% of the signal, approaching the 1.5% achieved with state-of-the-art microcultivation in liquid media (Warringer et al. 2003; Warringer et al. 2011). Thus, although some spatial bias that is not fully captured by neighbouring controls does remain, this reference-surface normalization protocol permits a measurement accuracy that almost matches the best that can be achieved with lower throughput approaches.
Scan-o-matic extends and nuances the salt biology of baker’s yeast
Gene-by-salt interactions have previously been called by scoring NaCl-sensitive gene deletion strains with state-of-the art microcultivation, and the underlying biology extensively detailed (Warringer et al. 2003). To evaluate the recapitulation of established knowledge, we compared our new method with previous microcultivation experimental calls of salt-sensitive mutants in the complete yeast gene deletion collection.
Normalising to growth defects in absence of any stress (Fig 4A), we identified a large number of salt-sensitive gene deletions. Many of these, such as those encoding Hog1 and Pbs1 that activates the osmo-response (Hohmann 2002) or the transcription factor Crz1, corresponded to proteins well known to control salt tolerance and exhibited growth curves on solid and liquid media that were very similar (Fig 4A, B). Among the salt-sensitive mutants identified in this genome-wide screen, 14 biological processes were enriched. Many of these, e.g. ion homeostasis, ion transport, response to osmotic stress, and endocytosis and vacuolar transport (Fig 4C), have previously been shown to be of importance during salt exposure (Warringer et al. 2003).
However, the distribution of gene-by-salt interactions was substantially wider using our new set of protocols, with a distinct shoulder towards salt sensitivity (Fig 4D). Thus, at any given signal strength and at any given significance stringency, more interactions were identified on agar compared to liquid microcultivation. The amplification of signal strengths of traits was evident also with regard to growth in absence of stress, suggesting the increase of phenotypic variation on solid media to be a general phenomenon (Fig S12A). Notably, we also found that genes that were required for salt tolerance during liquid microcultivation were in many cases irrelevant for salt tolerance when scored on agar (data not shown). This suggests that differences in salt-specific genotype-by-environment interactions were due to different demands on cells dispersed in a solution compared to cells fixed in a colony structure.
To exclude batch and temporal bias and stringently test the influence of cultivation method on the gene-by-salt interactions, we re-screened the same subset of genotypes with both cultivation methods using identical media, random positioning and parallel screening. We found correlation between cultivation methods to be only intermediate (r2 = 0.5 - 0.6), both in the presence and absence of salt (Fig 4E, S12B, S12C). Roughly 10% of the phenotypic variation could be explained by technical noise, meaning that the remaining 30–40% derived from phenotypic effects imposed by cultivation method differences.
Both cultivation methods called the most important salt-defence regulators, like signalling components Hog1 and Pbs1, Rvs161 and Rvs167 that reorganizes the action cytoskeleton during Na+ stress (Balguerie et al. 2002; Lombardi and Riezman 2001), the regulatory subunits Ckb1 and Ckb2 of the casein kinase that regulates Na+ extrusion (Glover 1998), the regulatory subunit of calcineurin Cnb1 plus its downstream transcription factor Crz1 that controls Na+ efflux at current pH (Stathopoulos and Cyert 1997), and the cation extrusion regulator Sat4 (Mulet et al. 1999) (Fig 4E). However, most mutants were salt sensitive in only a single cultivation regime. Thus, despite an abundance of amino acids in the media, the absence of either Aro2 or Aro7, required for synthesis of aromatic amino acids, was central to salt tolerance only during colony agar-growth, as were the removal of the RNA pol II component Med1 or the inhibitor of pseudohyphae formation, Sok2 (Fig 4F). In contrast, presence of the clathrin coated vesicle component Apl4 and the actin cytoskeleton associated Sla1 were critical to salt tolerance only for cells reproducing dispersed in a solution. Overall, 81% of re-tested gene deletions differed significantly (Student’s t-test, FDR q<1%) in salt tolerance between solid media and microcultivation growth, the majority (68%) of phenotypes being amplified on solid media. Thus, whereas our new approach recapitulated the essence of established salt biology wisdom, it also highlighted the large influence of cultivation approach on both the quantitative and qualitative aspects of phenotypes.
DISCUSSION
Scan-o-matic as currently implemented drastically improves and sets a new standard in microbial growth phenomics but does not exhaust opportunities for future improvements.
Further reductions in random noise would require changing from 8- to16-bit image depth, increased scanning frequency or enhanced image resolution, all of which are associated with trade-offs. 16-bit image depth increases file size, challenging the logistics of data storage and analyses, and is poorly supported. Scanning frequency and image resolution are in a direct quality trade-off as enhanced image resolution reduces scanning speed and thereby increases exposure to radiation, heat and dehydration from light. This is a serious concern because exposure to intense light severely impedes growth (Logg et al. 2009), however, the light-sensitive hog1 and pbs2 mutants where not affected by the scanning frequency currently employed (data not shown).
Reductions in bias require meticulous attention to experimental design or to posterior correction procedures. Early spatial bias derives primarily from systematic variations in numbers or physiological states of cells deposited, the latter emerging due to transfer of metabolically distinct central or peripheral members of pre-culture colonies. To ensure that the normalization surface compensates for such bias, controls and experiments should originate from the same pre-culture plate. Late spatial bias derives from variations in nutrient access and toxin exposure that depend on media thickness and homogeneity as well as number and size of metabolically active colonies in the local environment. Standardization of plate casting procedures are thus essential as are media buffering because cell secretion of acidic metabolites otherwise drives patch-wise pH variations (Fig S13) affecting local growth. Earlier published posterior correction procedures have used spatial variations in the performance of experimental colonies themselves and assumed that spatial bias mainly manifests as an edge effect that can be compensated for in a row/column wise manner (Baryshnikova et al. 2010). Adoption of this approach as an additional level of spatial bias normalization is unlikely to improve accuracy, first because use of experimental data both as correction input and as final output creates unsound statistical dependencies and artefacts and second because the edge effect is not a major driver of bias at the time of maximal growth rates. Due to its dramatic nature, with plateaus of low and high bias respectively and their separation by sharp fault lines, comprehensive posterior removal of the remaining spatial bias is extremely challenging.
Condensation of population growth curves into estimates of maximal growth rates makes poor use of the vast trove of accumulated data. Time before net growth commences (growth lag) and the total gain in population size (growth efficiency), are additional fitness components (Cooper 1991) capable of driving adaptation (Ibstedt et al. 2015). Higher noise in the beginning and end of growth means that more attention to the experimental design and analysis procedure is needed before robustness can be achieved also for these fitness components. Beyond lag and efficiency, colonies rarely maintain maximal growth rates over extended periods of time. In addition, growth often becomes multiphasic (Warringer et al. 2008). A single growth rate estimate, maximal growth rate, will thus not fully capture the true dynamics of growth. Future more exhaustive use of growth information will help ensure that the introduced Scan-o-matic framework provides truly high-quality microbial phenomics data for extensive cohorts of individuals to generate well-populated, highly resolved and standardized databases.
MATERIALS AND METHODS
Physical arrangement of Scan-o-matic
Scan-o-matic is based on high quality mass-produced desktop scanners (Fig 1A) that are controlled by power-managers (Fig S1A). Images are acquired using SANE (Scanner Access Now Easy) (Mosberger 1998) using transmissive scanning at 600 dpi, 8-bit grey-scale and a scan area extension that captures four plates per image (Fig 1B). Plates are fixed in custom-made acrylic glass fixtures with orientation markers ensuring software recognition of fixture position (Fig 1B). Fixtures are calibrated to scanner by a fixture calibration model (Fig S2B, S4). Pixel intensities are standardized across instruments using transmissive grey scale calibration targets (Fig S4). Scanners are maintained in a 30°C, high humidity environment and kept covered.
Scan-o-matic software, image acquisition and analysis
Scan-o-matic is written in Python 2.7 and can be installed from https://github.com/local-minimum/scanomatic/wiki. Matplotlib is used for graph production. Numpy, Scipy and Scikits-Image is used for computation and analysis (van der Walt et al. 2011; van der Walt et al. 2014). Experiments are initiated from a web-interface (Fig S2A), 7 minutes being the minimum time interval between scans (20 min as default) and 96, 384 or 1536 pinned plates as allowed formats. Each series of scans is analysed in a two-pass process. The first-pass is performed during image acquisition (Fig S1B). Using a fixture calibration model (Fig S2B) and orientation markers (Fig S3A), the plate and transmissive grey scale calibration strip positions are identified (Fig S3B). The transmissive grey scale calibration strip area is trimmed (Fig S4A) and segment pixel intensities are compared to manufacturer’s supplied values (Fig S4B). In the second-pass analysis (Fig S1C), a virtual grid is first established across each plate so that grid intersections match the centres of likely colonies (Fig S5). At grid intersections, the local area is segmented, colonies defined relative the local background and pixel intensities of both are determined, compared and converted into actual cell number estimates (Fig S6, S7). Colony image areas were determined as the number of pixels included in each colony definition.
Extracting, evaluating and normalizing population growth rates from smoothed growth curves
Raw growth curves are smoothed, first using a median filter that removes local spikes and then using Gaussian filter that reduces remaining noise. Initial colony population size, maximal population size doubling time, time of extraction of the maximal population size doubling time, error of the linear regression that underlies extraction of the maximal doubling time, and the growth curve fit to an initial value extended version of the classical Chapman-Richard model are extracted from smoothed growth curves. The latter three are used to flag poor quality growth curves that are visually inspected for potential rejection (~0.3% were here rejected).
Every 4th experimental position was here reserved for internal isogenic controls. Doubling times of controls were used to establish a reference surface as follows. Controls with extreme values were removed. Remaining control positions were then used to interpolate a normalization surface that was smoothed, first with a median filter to exclude any remaining noisy measurements, and then with a Gaussian smoothing to soften the landscape contours. For each colony, the log(2) difference between the observed doubling time and the doubling time of the normalization surface in that position was calculated. When multiple plates were included in an experimental series (Fig 4), we first normalized for between-plates bias by shifting the mean of the non-spatially normalized controls on each plate to match the mean over all plates. To call gene-by-salt interactions (Fig 4) we normalized growth rates in NaCl by subtracting growth rates in a no stress environment.
Wet-lab experimental procedure
Solid media plates were cast with 50mL of Synthetic Complete (SC) agar medium buffered to pH 5.8 with 2% glucose or 2% galactose as carbon source and with and without 0.85M NaCl. Two strain layouts were used: 1) all colonies being diploid BY4743 reference strain (Brachmann et al. 1998) (Fig 1–3) 2) colonies corresponding to single yeast gene knockouts of the haploid BY4741 deletion collection (Giaever et al. 2002), with WT control colonies interleaved in every fourth position and n=3 replicates of each strain in juxtaposition (Fig S11). For the confirmation experiment (Fig 4E-F, S12B-C) the same procedure was employed, but at high (n=24 replicates). The reference liquid media experiments (Fig 4E-F, S12B-C) were as previously described (Warringer et al. 2011).
ACKNOWLEDGEMENTS
We thank Olle Nerman and Mats Kvarnström for much appreciated statistical and analytical support and Charlie Boone for access to strains. Financial support from the Swedish Research Council (325–2014–6547 and 621–2014–4605) and from the Carl Trygger Foundation (CTS 12:521) to JW, from the Swedish Research Council FORMAS to AB, from the Slovenian Research Agency (P1–207 and L2–1112) to UP and PK and from and LABEX SIGNALIFE (ANR-11-LABX-0028–01) to JH is acknowledged.