Megapixel camera arrays for high-resolution animal tracking in multiwell plates

Tracking small laboratory animals such as flies, fish, and worms is used for phenotyping in neuroscience, genetics, disease modelling, and drug discovery. Current imaging systems are limited either in spatial resolution or throughput. A system capable of imaging a large number of animals with sufficient resolution to estimate their pose would enable a new class of experiments where detailed behavioural differences are quantified but at a scale where hundreds of treatments can be tested simultaneously. Here we report a new imaging system consisting of an array of six 12-megapixel cameras that can simultaneously record from all the wells of a 96-well plate with a resolution of 80 pixels/mm at 25 frames per second. We show that this resolution is sufficient to estimate the pose of nematode worms including head identification and to extract high-dimensional phenotypic fingerprints. We use the system to study behavioural variability across wild isolates, the sensitisation of worms to repeated blue light stimulation, the phenotypes of worm disease models, and worms’ behavioural responses to drug treatment. Because the system is compatible with standard multiwell plates, it makes computational ethological approaches accessible in existing high-throughput pipelines and greatly increases the scale of possible phenotypic screening experiments in C. elegans.


Introduction
Recording and quantifying animal behaviour is a core method in neuroscience, behavioural genetics, disease modelling, and psychiatric drug discovery. Both the scale of behaviour experiments and the information that can be extracted from them have increased dramatically [1][2][3][4] . However, further increases in throughput are possible and would enable entirely new kinds of experiments. We therefore sought to build a system to image freely behaving animals that would maximise both phenotypic content and experimental throughput. In terms of phenotypic content, a key parameter is the resolution of the recording. If there is sufficient spatial and temporal resolution, then body parts can be identified and tracked, the animal's pose can be estimated, and the full suite of computational ethology methods can be applied to analyse any behaviour of interest. Because of its simple morphology, detailed pose estimation is well-established for the roundworm C. elegans [5][6][7][8][9][10][11][12][13][14][15][16][17][18] and previous work has shown the usefulness of detailed behavioural phenotyping in several domains including, for example, classifying mutants 5,6,19,20 , studying chemotaxis 21 and thermotaxis 22 , quantifying escape responses [23][24][25] , and addressing basic questions in computational ethology and the physics of behaviour 10,26,27 . Maintaining sufficient resolution for pose estimation was therefore the first design constraint we required. In early worm trackers, maintaining high resolution required a motorised stage to keep a single worm in the field of view of a low-resolution camera 5,28,29 , but the availability of inexpensive megapixel cameras enabled multiworm tracking with sufficient resolution to estimate each worms' pose and determine its head position 12,17 .
To maximise experimental throughput, we wanted to use off-the-shelf multiwell plates so that any behaviour screening pipeline would still be compatible with existing pipeline elements such as liquid and plate-handling robots as well as small animal sorting machines. Because behaviour occurs over time, a standard plate-scanning approach in which each well of a multiwell plate is imaged in turn using a motorised stage limits throughput regardless of scan speed since each well must be recorded long enough to observe the behaviour of interest. Moreover, mechanical arrangements with moving parts introduce higher maintenance costs and have a higher risk of failure compared to a static camera system. Therefore, our second design constraint was that the system should be able to image all of the wells of a multiwell plate simultaneously without move parts. The associated workstations to run the Motif software were arranged in the two server racks underneath. B) 3D schematic drawing of a single imaging unit (Kastl -Highres). The six cameras were mounted on a plate that is connected to the rig frame by three spring-loaded screws, and can be moved along the vertical axis. This allows for changing the focal plane of all six cameras at once. One of the imaging unit's side panels is omitted from this view. C) Technical drawing of an imaging unit annotated with dimensions in millimetres. D) Using five identical units, 480 wells can be recorded simultaneously. Zooming in to the E) camera, F) well, G) and worm level shows that this system achieves enough resolution to precisely track the nematodes. Each square well measures 8 x 8 mm.

Choice of suitable multiwell plate design
Because the fields of view of the six cameras partially overlap, the imaging system provides flexibility in selecting a multiwell plate with any number of wells. For our purposes, 96-well plates with square wells provided a good balance between imaging area and number of wells ( Figure 1D, E). Plates with smaller numbers of wells would reduce imaging throughput while 96-well plates with circular wells would reduce the area available for worms to behave and increase shadowing around the well edges (Supplementary Figure 1A). Using square well plates (Whatman 96 well plate with flat bottom, GE Healthcare, Chicago, IL, USA) significantly increases the efficiency of the system: in our tests, in plates with circular wells only 21% of the imaging area is available for capturing useful data, while the rest is outside of any well or lost in shadows. For square wells, 43% is available for behaviour. This fraction can be further increased by using custom plates with thinner wall dividers and shallower wells to reduce the shadowing (Supplementary Figure 1B), but this comes at an increased cost of manufacture.
The output of the combined system is 30 videos tiling across the five imaged multiwell plates corresponding to 480 simultaneous behaviour assays ( Figure 1D). Expanding the image to the level of a single well ( Figure 1F) and a single animal ( Figure 1G) shows that the resolution is sufficient to estimate the pose and identify the head of single worms, which can reveal detailed trajectory differences between individuals that are the basis for quantitative behavioural phenotyping.

High throughput imaging
Due to the high amount of raw image data produced by USB3 cameras at full bandwidth (6 cameras recording at 25 fps produce approximately 6.5TB/hour of raw footage), live compression on a dedicated system was required. To achieve this, we used a total of 10 Motif Recording Units (Motif -Video Recording System, LoopbBio GmbH, Vienna, Austria) equipped with Nvidia Quadro P2000 GPUs (Nvidia Corporation, Santa Clara, CA, USA), each recording from 3 cameras. The two recording units with cameras from the same system were set up in a parent-child configuration.
The Motif software acquires and compresses images on the fly and stores them in the open imgstore format (https://github.com/loopbio/imgstore) along with timestamps and frame numbers for each individual frame, as well as continuous and synchronised recordings of environmental data (for each unit this was: outside temperature and inside temperature, humidity, and light level). Recording the time and frame number for each image allows precise timekeeping over a long recording duration as it removes temporal drift due to skipped or dropped frames and due to differences in camera clocks. Additional metadata for each recording is saved with the imgstore including the camera serial numbers, camera and system settings, and any user defined data.
A single workstation manages all imaging experiments on all units across the whole system, from video collection to data transfer, by accessing the Motif user interfaces using the webbrowser of the parent machines in the camera arrays. Given the large number of highresolution cameras, the control workstation was connected to a large monitor (we use a 43inch 4k monitor) to facilitate camera focussing and sample positioning.
In addition to providing a web accessible user interface, Motif allows complete control of the camera arrays and arbitrarily complex scheduling of data acquisition and photostimulation programmatically, via an API (https://github.com/loopbio/python-motifapi). This allows us to run imaging experiments on all camera arrays by executing a single Python script on the monitoring workstation. Encoding the parameters of experiments in a script improves reproducibility by making parameters consistent over time by default.
Updates to Tierpsy Tracker, and companion software, for multiwell imaging format In our camera array system, each camera records multiple wells which complicates metadata handling since there is no longer a one-to-one correspondence between a video file and a particular experimental condition. We have updated Tierpsy Tracker 17 to handle videos with multiple wells: it can automatically identify wells from the video (Figure 1F), and return results on a well-by-well basis. In the Viewer, the user can see the names and boundaries of the wells, and have the option of marking any well as "bad" if necessary. This flag is propagated to the final tracking results so that the contribution of "bad" wells can be filtered out for downstream analysis.
To keep track of the experimental conditions of each well we have developed an open-source module in Python to automatically handle experimental metadata (github.com/Tierpsy/tierpsy-tools-python). For each experiment, a series of csv files specifies the worms and compounds (if applicable) that were added to each well. This can include information on how a COPAS worm sorter (Union Biometrica) was used to dispense different strains in the wells of the imaging plates, which compound source plate was replicated onto each imaging plate, or any column shuffling performed by a liquid handling robot. These tables are then combined to create a mapping between each well in an imaging plate (identified by a unique ID) and an experimental condition. For each imaging run, the user needs to log the camera array used for each imaging plate at the time of the experiment. This information is then mapped to video file names to create a final metadata table suitable for subsequent analysis (see Material and Methods and Supplementary Figure 3 for more details).
Another key software improvement we incorporated is a convolutional neural network (CNN) to exclude non-worm objects from subsequent analysis. While we previously used contrast-based segmentation and size-based filtering for worm detection in our analysis 17 , introducing the CNN into Tierpsy Tracker improves the quality of the tracking data and the subsequent analysis results as well as the speed of the analysis because fewer objects are analysed in subsequent steps (see Materials and Methods for more details).

Rapid assessment of natural variation in behaviour
We tracked the behaviour of N2 and wild isolates of the divergent set in the C. elegans Natural Diversity Resource (CeNDR) strain collection with our system to detect natural variation in behaviour 30 . To further increase the dimensionality of the behavioral phenotypes, we included a blue light stimulation protocol using a set of four bright blue light LEDs. Each tracking experiment is divided into three parts: 1) a 5-minute pre-stimulus recording, 2) a 6minute stimulus recording with three 10-second blue light pulses starting at 60, 160, and 260 seconds, and 3) a 5-minute post-stimulus recording. Blue light can elicit an escape response in worms, thus expanding the range of observable behaviours 31,32 . Programmable blue light stimulation is reproducible, compatible with high throughput assays, and is also useful for optogenetic stimulation. Stars indicate statistically significant differences between N2 and wild isolates at a false discovery rate of 1% using Kruskal-Wallis tests and correcting for multiple comparisons with the Benjamin-Yekutieli method. A) Morphological differences were detected between strains. The length and the midbody width varied in a nonuniform way among strains. B) Adequate resolution enabled detailed characterisation of the worm posture and the detection of differences among strains in multiple dimensions. The curvature at different parts of the body varied in a non-uniform way among strains. The neck curvature showed more significant differences. The parts of the body are defined following the conventions adopted in Tierpsy Tracker 33 . C) The speed of wild isolates was on average higher than the speed of N2 worms. The response of wild isolates to blue light stimulus varied; some strains (e.g. EG4725) were more sensitive to blue light compared to N2, while others showed less obvious escape response (e.g. DL238). This provided additional dimensions to the behavioural phenotype. D) Using the quantitative behavioural phenotypes, strains were classified with significantly higher accuracy than random. Combining features from different blue light conditions increased the dimensionality of the data and the classification accuracy between strains. E) Worm strains were predicted in a held-out test set with 66% accuracy which is higher than random (9%).
We tracked on average 20 wells per strain. Given the high throughput achieved with our new system, this experiment can be performed within a few hours. The recordings of the camera array maintain enough resolution to extract the full set of Tierpsy features 33 , which describe in detail the morphology, movement, and posture of the worms, including subdivision by motion mode (forward, backward, and paused) and body part. We extract a set of 3076 summary features per well for each recording period (pre-stimulus, blue light, and poststimulus), resulting in a total of 9228 features for each well. This allows us to detect fine differences in the morphology, posture and movement of the worms which varies in a nonuniform way among wild isolates (Figure 2A-C). The neck curvature of wild isolates tends to show more significant differences to N2 worms than the curvature of other parts of the body, which might be related to differences in foraging behaviour between N2 and wild isolates ( Figure 2B). However, not all strains show the same curvature pattern across the body indicating natural variation in posture. All the wild isolates move on average faster than N2 worms but their response to blue light varies (with some being more and others less sensitive to blue light), showing that the blue light stimulus increases the dimensionality of behavioural differences ( Figure 2C).
To assess how well we can predict the worm strain based on its behavioural fingerprint, we estimated the classification accuracy using a random forest classifier. We first split the data into a training/tuning set and a held-out test set. We used the training set to select features using recursive feature elimination (RFE) and tune the hyperparameters of the model. Figure  2D shows the highest cross-validation accuracy achieved for different sizes of selected feature sets. The accuracy improves when we select features increasing the samples-tofeatures ratio, as this helps control overfitting and, in parallel, reduces the correlation between features. Combining features from different blue light conditions (blue curve) increases the dimensionality of the data and the classification accuracy. Using the best performing features and hyperparameters, we trained a classifier with the entire training/tuning set and used it to make predictions in the test set. The test accuracy we achieved is 66% which is significantly higher that random (9%).

Temporal response and sensitisation to aversive blue light stimulation
Having established that blue light stimulation can be leveraged to improve classification accuracy, we moved to further investigated the response elicited by blue light in N2 and CB4856 at a higher temporal resolution.
We imaged with blue light stimulation on both the N2 and CB4856 strains, and observed different behavioural responses between these two strains. We extracted the same set of 3076 previously defined features 33 with a time resolution of 10 seconds and used these to construct the behaviour phenotype space. N2 and CB4856 have well known behavioural differences 26,[34][35][36][37][38] and are expected to occupy different regions of the phenotype space. Principal component analysis (PCA) shows that application of blue light stimulation moves the strains from their already distinct positions in the plane defined by the first two principal components (PCs) to new positions, indicating detectable responses to the stimulus in both strains ( Figure 3A). Blue light-induced displacement through phenotype space led to better separation between the two strains ( Figure 3A, right), confirming that the addition of the stimulus can reveal further behavioural differences between two strains already known to be distinct. strains, with the fraction of worms moving forwards increasing during the stimulus and decreasing after the stimulus. However, post-stimulus recovery appears to occur at two timescales for N2 but not for CB4856. Solid lines are means, shaded areas show the 95% confidence interval. C) Repeated photostimulation triggered increasing aversive response in N2, also leaving a higher fraction of worms stationary after serial stimulation than before (vertical separation between the two dashed lines to contrast the before and the after levels). D) The fraction of worms triggered to move forwards by each stimulus increased throughout the stimulation series, with the opposite trend for worms initiating movement after pausing. Each data point was obtained by taking the difference in a 10 second window just before and just after the end of each stimulus. E) After repeated photostimulation, a larger fraction of the population than before was stationary. This was quantified by taking the difference of the population fractions in each motion mode between the final 5 minutes and the initial 5 minutes of the experiment (red dashed lines in C).
The C. elegans escape response is characterised by a combination of increased forward locomotion and decreased spontaneous reversals 39 . The differences in blue light-induced escape response between the N2 and CB4856 strains can thus also be seen by simply examining the fraction of the worm population moving forwards, moving backwards, or remaining stationary. For both strains, a single dose of photostimulation triggers a sharp and steady increase in the fraction of worms moving forwards, followed by a relaxation towards the pre-stimulus level once the stimulus ceases. The fraction of worms moving backwards has a slight increase at the beginning of the stimulus, and then decreases without increasing again until the stimulus is over. Finally, the fraction of stationary worms declines rapidly during the stimulus and is restored after the stimulus ends ( Figure 3B). However, while in CB4856 the rate at which the population fractions return to the pre-stimulus levels is steady, N2 shows a sharp initial decline in forward-moving worms over several seconds (and a corresponding sharp increase in stationary worms and backward-moving worms) before relaxing steadily. Repeated photostimulation (twenty pulses of 10 s on, 90 s off) of N2 worms causes sensitisation, as light pulses trigger a progressively increasing fraction of worms to move forwards (Figure 3). Meanwhile, between light pulses, worms recover to a progressively decreasing baseline level of forward locomotion (and conversely, progressively increasing stationary fraction), possibly due to fatigue from increased activities during the pulse. This reduced forward locomotion fraction persists in the absence of photostimulation, with no obvious return towards pre-stimulus levels over a 6.5 minute period after the final pulse (Figure 3C, E). The combined effects of sensitisation and fatigue leads to a roughly linearly increasing response over multiple light pulses, as illustrated by taking the difference between the fraction of worms moving forward before and after stimulation ( Figure 3D).
Photostimulation can thus better distinguish between worm strains using existing pre-defined feature sets, as well as create new features for quantifying the details of the escape behaviour. Similar experiments on habituation to repeated mechanical stimulation have been used extensively to study learning in C. elegans [40][41][42] . Aversive blue light stimulation acts through different sensory neurons and converges on the same motor circuits and so may provide useful comparative data to investigate the genetics and neuroscience of learning mechanisms. The addition of these new and interpretable features increases the dimensionality of the worm behavioural phenotypic space, which may be useful for phenotyping applications.
Behavioural phenotypes of ALS disease models in response to blue light A previous study generated several Amyotrophic Lateral Sclerosis (ALS) disease model strains that carry patient amino acid changes in the C. elegans sod-1 gene 43 . This study found that the disease model strains have no obvious behavioural defects unless they are exposed to oxidative stress by overnight treatment with paraquat.
We phenotyped these ALS disease model strains on our system and saw similar results. PCA of a pre-defined set of 256 Tierpsy features 33 under standard imaging conditions (five minutes of spontaneous behaviour) does not show clear differences between the strains (Figure 4A). Adding blue light pulses (three 10-second blue light pulses over six minutes) leads to better separation between the strains in PC space (Figure 4B). Although the SOD-1(+) wild-type control strain (blue) and the SOD-1(A4V) mutant disease strain (orange) clearly separate into their own clusters, SOD-1(H71Y), SOD-1(G85R) and SOD-1(0) null strains cluster together, suggesting that their overall responses to blue light are similar to each other. The clustering of SOD-1(H71Y), SOD-1(G85R), and SOD-1(0) strains upon blue light stimulation is consistent with the previous finding that all three strains have loss of sod-1 function in glutamatergic neurons. By contrast, the SOD-1(A4V) strain has overexpression of sod-1 in cholinergic neurons without affecting glutamatergic neurons 43 , and this disease strain forms its own cluster in the blue light PC space (Figure 4B). conditions. Features were extracted by Tierpsy Tracker. Each datapoint represents one plate average of the strain, with up to 12 independent wells for each strain in every 96-well plate. Each well contained an average of three worms. The time window represented in B is also shown in D. C) Changes in the overall fraction of forward (top) or paused (bottom) locomotion upon blue light stimulation. The difference was calculated by subtracting the average feature values over the t = 50-60 second pre-stimulus window from those over the t = 65-75 second first blue light pulse window (these correspond to the first and the second time points in D, respectively). Plate averages were used to generate the plot for each strain. Two sample t-test compared to the SOD-1(+) control strain: ** p<0.01; *** p<0.001. D) Overall fraction of forward locomotion under blue-light imaging conditions. Three 10 second blue light pulses (blue shading) started at t = 60, 160, 260 seconds, and feature values were calculated using 10 second windows centred around 5 seconds before, 10 seconds after, and 20 seconds after the beginning of each blue light pulse. Plate averages were used to generate the plot for each strain. Points are means of these plate averages and error bars show the standard deviation.
Behaviourally, the previous study reported an increased escape response from noxious paraquat in the three sod-1 loss-of-function strains caused by increased neuromuscular junction function 43 . Our results using blue light as a noxious stimulus reproduce the paraquatinduced differential escape responses. Upon blue light stimulation, SOD-1(H71Y), SOD-1(G85R), and SOD-1(0) strains show significantly bigger increases in forward locomotion compared to the SOD-1(+) control strain and the SOD-1(A4V) disease strain (Figure 4C, top). This increase in forward movement appears to be primarily at the expense of stationary ( Figure 4C, bottom) rather than backwards locomotion (Supplementary Figure 2A). Nevertheless, a closer look at reversal frequencies at a finer temporal resolution reveals decreased reversals in the three sod-1 loss-of-function strains but not the other two strains (Supplementary Figure 2B). Finally, the level of blue light-induced escape response is sustained across three pulses of light stimulation for all five strains with no obvious habituation effect (Figure 4D and Supplementary Figure 2B).

Phenotypic screen of human-approved drugs
We used a library of 245 drugs that have previously been shown to accumulate in worms 44 to quantify worms' responses to human-approved drugs across multiple behavioural features. Three worms were added to each well of 96-well plates and were left on the drug for four hours before imaging. We extracted the Tierpsy256 features from each imaging condition (pre-stimulus, blue light stimulus, post-stimulus) and concatenated the feature vectors so that each well was represented by a 768-dimensional feature vector. We used a linear mixed model to identify compounds that had a significant effect on behaviour in at least one feature as previously described 45 . The linear mixed model used the imaging day as a random effect to account for day-to-day variation in the data. The 153 compounds that had a detectable effect were kept for further analysis. The features were then z-normalised and both features and samples were hierarchically clustered using complete linkage and correlation as the similarity measure ( Figure 5A).
The compounds in the library are mostly well-characterised with known modes of action. By examining clusters in detail, we found several clusters that included multiple compounds from the same mode of action (Figure 5B-D). One of the identified clusters contains several antipsychotic compounds (Figure 5B). Several antipsychotics have been shown to have direct effects on the C. elegans nervous system [46][47][48]48 . The most clearly defined cluster ( Figure 5C) contains antibiotics. Because the worms were imaged on a lawn of bacterial food, the most likely cause of these behavioural differences is a change in the bacterial food lawn that worms sense and respond to, but a direct effect on the worms is not impossible since C. elegans do respond to some antibiotics 49,50 . A third cluster is enriched for histamine H1 receptor antagonists. Based on sequence similarity, there are no obvious orthologs of the human HRH1 receptor in C. elegans 51 and so the similarity in behavioural effects between these compounds may be driven by off-target effects. For example, the antihistamine Epinastine has direct effects on C. elegans octopamine receptors 52 .
Most of the compounds had a detectable effect on behaviour, but many of the effects were less obvious than a library of invertebrate-targeting compounds that we screened recently using the same method 45 . A part of the explanation is likely to be a lack of conservation of some drug targets between humans and worms, although it should be noted that many are sufficiently conserved that human-targeted drugs have effects through the expected receptor class 53 . Another reason some compounds do not have a detectable effect is drug uptake which is known to be an issue for drug screens in worms 44 , highlighting the continuing need for improved drug delivery to maximise the usefulness of worms in drug screening 54 .

Discussion
We have developed a megapixel camera array system to enable high throughput, high content imaging of worms in standard multiwell plates. By partially overlapping the fields of view of six cameras, we can image an entire 96-well plate at spatial and temporal resolutions that are sufficient for tracking C. elegans and extracting high-dimensional phenotypic fingerprints. We have added features to Tierpsy Tracker to make it compatible with the multiwell imaging format, so that each well is detected and analysed separately. We incorporated strong blue LED lights into the camera array system to provide precise and repeatable photostimulation and found that this leads to better separation between wild isolates and ALS disease model strains, in the latter case revealing phenotypes that could not be detected in standard unstimulated assays. Repeated blue light stimulation also revealed a novel sensitisation phenotype in N2 worms, in marked contrast to our initial expectation of habituation, as reported in previous experiments on repeated mechanical stimuli which are used to study learning in worms 12,41 .
Our imaging hardware and analysis software are designed to support high throughput phenotypic screening, as the multiwell format allows for a large number of experiments to be conducted simultaneously. Furthermore, our experimental pipeline uses liquid handling robots for dispensing agar, food, drugs, and worms, in order to streamline the workflow for large-scale phenotypic screening. On a typical eight-hour imaging day, a single experimenter can operate five runs on all five camera array units, thus collecting imaging data from 2400 independent wells in a 96-well plate format. Typical post-acquisition processing time for this volume of data (assuming the standard 16 minute video length at 25 fps, three worms per well) is 50-85 hours using a MacPro (Processor: 2.7 GHz 12-Core Intel Xeon E5; Memory: 64GB 1866 MHz DDR3) to go from raw video data to fully extracted behavioural features. Processing time increases significantly with object number and depends on the quality of the video (good contrast, lack of debris, etc.).
A main strength of our camera array system is its scalability. Screening throughput can be readily expanded with additional imaging units, as the system is modular and each camera array has a relatively small physical footprint. Motif software enables on-the-fly compression of raw videos during acquisition, thereby keeping data volume manageable. Post-acquisition analysis is easily parallelised since videos can be analysed independently and processing time can be decreased linearly by allocating more computational cores to the task (e.g. by using a high-performance cluster).
The megapixel camera arrays we describe here represent a natural progression in worm tracking hardware where advances in the past have come from multiplexing to increase throughput 13 and increasing resolution to get more information from multi-worm trackers 12 .
Our new system will make it possible to do higher throughput screening with a resolution that enables the full suite of computational ethology tools to be brought to bear on phenotyping. We anticipate this will open new directions in large scale behaviour quantification with applications in genetics, disease modelling, and drug screening.

Materials and Methods
Worm strains C. elegans strains used in this work are listed in Supplementary Table 1. Worms are cultured on Nematode Growth Medium (NGM) agar at 20 ºC and fed with E. coli OP50 following standard procedures 18 .

Standard phenotyping assay
The standard phenotyping assay was used for most experiments in this work unless otherwise noted (detailed protocol: https://dx.doi.org/10.17504/protocols.io.bsicncaw). See Supplementary Table 2 for the detailed protocols used to collect the data shown in each figure panel.
On imaging day, synchronised Day 1 adults were washed in M9 (detailed protocol: https://dx.doi.org/10.17504/protocols.io.bfqbjmsn) and dispensed into imaging plate wells using COPAS 500 Flow Pilot worm sorter (detailed protocol: https://dx.doi.org/10.17504/protocols.io.bfc9jiz6). Three worms were placed into each well unless noted otherwise. Plates were returned to a 20 ºC incubator for 1 hour to dry following liquid handling, and then placed onto the multi-camera tracker for 0.5 hour to acclimatise prior to image acquisition.

Drug experiments
Drug experiments followed the standard phenotyping assay workflow, but with a few modifications. A detailed protocol can be found at http://dx.doi.org/10.17504/protocols.io.bs6znhf6.
Briefly, imaging plates were prepared with drugs the day before imaging and stored in the dark overnight at 4 ºC. Using a COPAS 500 Flow Pilot, three worms were dispensed into each well of 96-well plates. Following liquid handling, plates were kept in a 20 ºC incubator for an extra three hours to allow drug exposure (total drug exposure time was thus four hours).

Image acquisition
All videos were acquired at 25 fps on the trackers in a temperature-controlled room at 20 ºC, with a shutter time of 25 ms, and 12.4 µm px -1 resolution. For all experiments unless otherwise noted, three sequential videos were taken, run in series by a script: a 5-minute prestimulus video, a 6-minute blue light recording with 10-second 100% intensity blue light pulses at the 60, 160, and 260 seconds mark, and a 5-minute post-stimulus recording. The timing of recordings and photostimulation was controlled using Loopbio's API for Motif software [https://github.com/loopbio/python-motifapi] in a script.
For the serial blue light stimulation experiments, the plates were continuously imaged for 43 minutes and 20 seconds in the following stimulation pattern: 5 minutes off, 20 x (10 s on, 90 s off), 5 minutes off.
Image processing and quality control Segmentation, tracking, and pose estimation over time was performed using Tierpsy Tracker. Each video was checked using Tierpsy Tracker's Viewer, and wells with visible contamination, agar damage, or excess liquid (from worm sorter, so that worms swim rather than crawl) were marked as bad and excluded from the analysis.

Convolutional neural network to exclude non-worm objects
We improved Tierpsy tracking by incorporating a CNN classifier after segmentation to exclude non-worm objects from being analysed and skewing the results.
In the video compression step at the beginning of the Tierpsy analysis pipeline, a segmentation algorithm detects putative worm objects according to a set of user-defined parameters. The pixels in the frame that are further away than a threshold from any of the putative worms are set to 0, creating a "Masked Video". The objects selected by the masking algorithm are tracked throughout the video, but now if only they pass the filtering step powered by a CNN classifier.
The classifier was trained on a dataset of 43,561 grey-scale "masked" images measuring 80x80 pixels each, collected across several imaging systems in our lab. All images were manually annotated and objects were marked as either "worm" or "non-worm" by two independent researchers, so a consensus could be sought. The annotated dataset was split into training, validation, and test sets containing 80%, 10%, and 10% of the images, respectively, while keeping the classes balanced in each set. All images were pre-processed in two steps. First, the background pixels set to 0 by the masking algorithm were shifted to the top 95 percentile of the grey values in the unmasked area. This prevents the artificial edge between the masked and non-masked area from disproportionately influencing the classifier. Second, all pixel values were scaled to the range of 0 to 1 by min-max normalisation, to reduce the influence of variable illumination and contrast in different imaging setups.
The architecture of the CNN is a shallower adaptation of VGG16 55 , featuring eight convolution layers with 3x3 filter size and stride 1, each followed by a rectified linear activation unit, four max-pool layers (filter size 2x2, stride 2) applied every two convolution layers, and a fully connected layer. Batch normalisation is applied to the third and seventh convolution layer to accelerate training by reducing internal covariate shift 56 , and a Dropout layer is added before the fully connected layer to prevent overfitting 57 . In total, the CNN has about 1.78 million trainable parameters.
The CNN classifier was implemented in PyTorch 1.6, and was trained with the cross entropy loss function and the Adam optimisation algorithm 58 at a learning rate of 10 -4 . It achieved an accuracy of 97.68% and F1 score of 97.98% as measured on the independent test set.
To improve performance at the inference step, we apply the CNN to a subset (one image per second) of all the images featuring the same putative worm object. This yields, per snapshot, the probability of the object to be a valid worm. If the median of this probability over time is higher than 0.5, the object is classified as a valid worm.
Video processing with multiple wells Using multiwell plates for imaging significantly increased the experimental throughput, but also introduced challenges for data analysis as each video output contains 16 separate wells. Further software engineering was thus warranted to process multiwell videos, so that wells are detected and analysed separately.
To achieve this, we implemented an algorithm in Tierpsy Tracker that automatically detects multiple wells in a field of view and stores the coordinates of well boundaries. Briefly, we created a template that approximates the appearances of a well in the video, and replicated it on a lattice to simulate the grid of wells. The overall dimensions of the lattice are defined in Tierpsy's configuration file, but the lattice spacing parameters were chosen, via SciPy's differential evolution routine 59 , to minimise the differences between the video's first frame (or its static background, if Tierpsy was instructed to calculate it) and the simulated grid of wells.
Automatic extraction of behavioural features was then performed on a per-worm basis, before worms were sorted into their respective wells based on their (x, y) coordinates in order to obtain well-averaged behavioural features.

Data provenance
Tracking multiwell plates complicates the handling of metadata, since there isn't a unique mapping between videos and experimental conditions. When well shuffling is performed using the liquid handling robot, the well contents in the imaging plate also needs to be tracked. To handle experimental metadata for imaging with the camera arrays, the records that need to be compiled manually during the experiments was standardised and an opensource module in Python (https://github.com/Tierpsy/tierpsy-tools-python/hydra) was developed to combine the experimental records to create a full metadata table with the experimental conditions for each well (Supplementary Figure 3).
The experimental records are typically compiled in the form of csv files. In each tracking day, the experimenter needs to record: i) information about the media type and the bacterial food present on the imaging plates, and the worm strains that were dispensed into the wells of the plates (this is recorded in a summarized way in the wormsorter.csv file), ii) information about the experimental runs, including the unique IDs of the imaging plates, the instrument name where each plate was imaged, and the environmental conditions (manual_metadata.csv), iii) if applicable, information about the contents of the compound source plates (sourceplate.csv) and the mapping between imaging plates and source plates (if the liquid handling robot was used for column shuffling, this mapping will be recorded automatically in the robotlog.csv; if there was no shuffling, this will be recorded in imaging2source.csv).
Using the functions in the hydra module, firstly a plate metadata table is created to contain all the well-specific experimental conditions for every well of each unique imaging plate, including the compound contents if applicable. Then, the information about the experimental runs is merged with the plate metadata to create a final metadata table with the complete experimental conditions for every recording of every well. At this stage, the video filenames are also matched to the sample based on the camera array instrument ID. For example scripts showing metadata handling, see https://github.com/Tierpsy/tierpsy-toolspython/tree/master/examples/hydra_metadata.

Analysis of time-resolved response to photostimulation
Tierpsy Tracker 17 was used to calculate a set of 3076 summary features for each well for each non-overlapping 10 s interval of the 6-minute stimulus recording (with three 10-second blue light pulses starting at 60, 160, and 260 seconds). Samples where more than 40% of the features failed to be calculated were excluded from the analysis, and so was any feature that failed to be calculated for more than 20% of the samples in any of the 10 s intervals. Missing values were then imputed by averaging the valid values within each time interval. The feature matrix (all wells, in all time intervals) was then scaled by applying z-normalisation. Principal Components were then calculated using the whole feature matrix. Figure 3A shows a density plot of the measurements collected in the 10s immediately before (left) and immediately after (right) a 10-second stimulus, projected onto the plane defined by the first two principal components.
To investigate the response to photostimulation with higher temporal resolution, Tierpsy Tracker 17 was used to detect the motion mode (forwards, backwards, stationary) of each worm over time. To calculate the fraction of worms in each motion mode over time ( Figure  3B), the number of worms in each motion mode at each time point in each well was divided by the total number of tracked worms at each time point in each well. This gave the fraction of worms in each motion mode, at each time point, for each well, so that an average could be taken across all wells. The 95% confidence interval for the average was obtained by nonparametric bootstrap (n = 1000 resamplings, with replacement). Figure 3C-E, the motion mode detected by Tierpsy Tracker for each worm over time was first down-sampled to 0.5 Hz by dividing the video into nonoverlapping two seconds intervals and taking the prevalent motion mode in each interval. The fraction of the worm population in each motion mode over time was calculated by counting the number of worms in each motion mode and then dividing by the total number of worms detected at each time point. The 95% confidence interval was calculated via nonparametric bootstrap by the seaborn Python library.

Classification of wild isolates
For the classification of the divergent set we used a random forest classifier as implemented in scikit-learn 60 . For feature selection we used recursive feature elimination with a random forest estimator (RFE), as implemented in scikit-learn 60 . We started by splitting the data randomly in a training/tuning set and a test set, with 20% of the data from each strain assigned to the test set. We used the training/tuning set for feature selection. We tried specific candidate feature set sizes {2 i , for i=7:11}. For each size, we performed cross-validation and: i) used each training fold to select N features and train a classifier with the selected features; ii) used each test fold to estimate the classification accuracy. We repeated the process 20 times to get statistical estimates of the mean cross-validation accuracy for each size and selected the best performing size Nbest. We then selected Nbest features using the entire training/tuning set and used this set for downstream analysis. At a second stage, we tuned the hyperparameters of the random forest classifier using grid search with cross-validation as implemented in scikit-learn 60 with the grid shown in Table 1. The best performing parameters are reported in Table 1. Finally, we trained a classifier on the entire training set using the selected features and hyperparameters and used it to make predictions on the test set.