Development of AI-assisted microscopy frameworks through 1 realistic simulation in pySTED

15 The integration of artiﬁcial intelligence (AI) into microscopy systems signiﬁcantly enhances perfor-16 mance, optimizing both the image acquisition and analysis phases. Development of AI-assisted super-17 resolution microscopy is often limited by the access to large biological datasets, as well as by the di ﬃ cul-18 ties to benchmark and compare approaches on heterogeneous samples. We demonstrate the beneﬁts of 19 a realistic STED simulation platform, pySTED , for the development and deployment of AI-strategies for 20 super-resolution microscopy. The simulation environment provided by pySTED allows the augmentation 21 of data for the training of deep neural networks, the development of online optimization strategies, and 22 the training of reinforcement learning models, that can be deployed successfully on a real microscope. 23


Introduction
Super-resolution microscopy has played a pivotal role in life sciences by allowing the investigation of the nano-organization of biological samples to a few tens of nanometers [1].STimulated Emission Depletion (STED) [2], a point scanning based super-resolution microscopy fluorescence modality, routinely allows resolution down to 30-80 nm to be reached in fixed and live samples [1].One drawback of STED microscopy is the photobleaching of the fluorophores associated with the increased light exposure at the sample [1,3,4].Photobleaching results in a decrease in fluorescence, limiting the ability to capture multiple consecutive images of a particular area and may also increase phototoxicity in living samples [4,5].In an imaging experiment, photobleaching and phototoxicity need to be minimized by careful modulation of the imaging parameters [5,6] or by adopting smart-scanning schemes [7][8][9].Integration of AI-assisted smart-modules to bioimaging acquisition protocols has been proposed to guide and control microscopy experiments [6,7,10,11].However, Machine Learning (ML) and Deep Learning (DL) algorithms generally require a large amount of annotated data to be trained, which can be difficult to obtain when working with biological samples.
Diversity in curated training datasets also enhances the model's robustness [12,13].While large annotated datasets of diffraction-limited optical microscopy have been published in recent years [14,15], access to such datasets for super-resolution microscopy is still limited, in part due to the complexity of data acquisition and annotation as well as a limited access to imaging resources.
To circumvent this limitation, simulation strategies have been employed for high-end microscopy techniques.
For instance, in Fluorescence Lifetime Imaging Microscopy (FLIM), it is common practice to use simulation software to generate synthetic measurements to train ML/DL models [16].The models can be completely trained in simulation or with few real measurements.Researchers in Single Molecule Localization Microscopy (SMLM) have also adopted simulation tools in their image analysis pipelines to benchmark their algorithms [17][18][19].Nehme et al. [20] could train a DL model with simulated ground truth detections and few experimental images which was then deployed on real images.In STED microscopy, simulation software are also available.However, they are limited to theoretical models of the point spread function (PSF) [21,22] or effective PSF (E-PSF) [8,23], without reproducing realistic experimental settings influencing the design of STED acquistions (e.g.photobleaching, structures of interest, scanning schemes).This limits the generation of simulated STED datasets and associated training of ML/DL models for smart STED microscopy modules.
We created a simulation platform, pySTED, that emulates an in-silico STED microscope with the aim to assist the development of AI methods.pySTED is founded on theoretical and empirically validated models that encompass the generation of the E-PSF in STED microscopy, as well as a photobleaching model [3,21,24,25].Additionally, it implements realistic point-scanning dynamics in the simulation process, allowing adaptive scanning schemes and non-uniform photobleaching effects to be mimicked.Realistic samples are simulated in pySTED by using a DL model that predicts the underlying structure (datamaps) of real images.
pySTED can benefit the STED microscopy community by facilitating both the acquisition and analysis aspects (Extended Fig. 1).It also enables the research and development of reinforcement learning (RL) methods that allowed the successful control of complex systems on a wide variety of tasks in games, robotics, or even in microscopy imaging [11,[25][26][27][28]. RL methods can learn by interacting with the realistic STED environment, which would not be possible on a real system due to data constraints [29].pySTED is implemented in a Google Colab notebook to help trainees develop their intuition regarding STED microscopy on a simulated system (Extended Fig. 1i).We demonstrate how the performance of a DL model trained on a semantic segmentation task of nanostructures can be increased using synthetic images from pySTED (Extended Fig. 1ii).A second experiment shows how our simulation environment can be leveraged to thoroughly validate the development of AI methods and challenge their robustness before deploying them in a real-life scenario (Extended Fig. 1iii).
Lastly, we show how RL agents can be trained to adjust STED imaging parameters in a synthetic pySTED task.The resulting trained agent can be deployed in real experimental conditions to resolve nanostructures and recover biologically relevant features by bridging the reality gap (Extended Fig. 1iv).

STED simulation with pySTED
We have built a realistic, open-sourced1 , STED simulation platform within the Python environment, namely pySTED.pySTED breaks down a STED acquisition into its main constituents: wavelength dependent focusing properties of the objective lens, fluorophore excitation and depletion, and fluorescence detection.Each step of the acquisition process corresponds to an independent component of the pipeline and is created with its own parameters (Supplementary Tables 1-4) that users can modify according to their experimental requirements (Figure 1a) [21].Generating a synthetic image with the pySTED simulator requires the user to specify the positions of the emitters in the field of view (referred to as datamap) and to provide the characteristics of the fluorophore (Figure 1a and Supplementary Table 5).The emission and photobleaching properties of the fluorophores that are implemented in pySTED and inspired from previous theoretical and experimental models [3,24].As in a real experiment, the datamap is continuously being updated during the simulation process to realistically simulate point-scanning acquisition schemes (Figure 1a-e, Methods).
Realistic datamap generation Datamaps that can reproduce diverse biological structures of interest are required for the development of a simulation platform that enables the generation of realistic-synthetic STED images.Combining primary object shapes such as points, fibers, or polygonal structures is efficient and simple for some use-cases but is not sufficient to represent more complex and diverse structures that can be found in real biological samples [17][18][19]30].It is essential to reduce the gap between simulation and reality for microscopist trainees or to train artificial intelligence models on synthetic samples prior to the deployment on real tasks [31,32].
We sought to generate realistic datamaps by training a DL model to predict the underlying structures from real STED images which can then be used in synthetic pySTED acquisition.We chose the U-Net architecture, U-Net datamap as it as been shown to perform well on various microscopy datasets of limited size [33,34] (Figure 1f).We adapted a previously established approach in which a low-resolution image is mapped to a resolution-enhanced image [35,36].Once convolved with an equivalent optical transfer function the resolution-enhanced synthetic image is compared with the original image.
Here, we trained the U-Net datamap on STED images of proteins in cultured hippocampal neurons (Methods and Supplementary Fig. 1).During the training process, the model aims at predicting the underlying structure (datamap) such that the convolution of the approximated PSF of the STED microscope (fullwidth at half maximum (FWHM): ⇠ 50 nm, measured from FWHM of real STED images) minimizes the mean quadratic error with the real image (Figure 1f).After training, given a real image, the U-Net datamap generates the underlying structure (Supplementary Fig. 1).From this datamap, a synthetic pySTED image can be simulated with different imaging parameters (low or high resolution).Qualitative comparison of the synthetic images acquired in pySTED with the real STED images (Supplementary Fig. 1) shows similar super-resolved structures for different neuronal proteins confirming the capability of the U-Net datamap to predict a realistic datamap.
Validation of pySTED with a real STED microscope We characterized the capacities of pySTED to simulate realistic fluorophore properties by comparing the synthetic pySTED images with real STED microscopy acquisitions.We acquired STED images of the protein bassoon, which had been immunostained with the fluorophore ATTO-647N in dissociated cultured hippocampal neurons.We compared the effect of varying the imaging parameters on the pySTED simulation environment and on the real microscope (Supplementary Fig. 2-4).For pySTED we used the photophysical properties of the fluorophore ATTO-647N from the literature (Supplementary Tab. 5) [3,37].The photobleaching constants (k 1 and b) were estimated from the experimental data by using a least-squares fitting method (Methods).Synthetic datamaps were generated with the U-Net datamap to facilitate the comparison between simulation and reality.
We first compared how the imaging parameters on the real microscopes and in the pySTED simulations (pixel dwelltime, excitation and depletion powers) influenced the image properties by measuring the resolution [38] and the signal ratio [6] (Methods and Supplementary Fig. 2a).As expected, modulating the STED laser power influences the spatial resolution in real experiments and in pySTED simulations.Examples of acquired and synthetic images are displayed in Supplementary Fig. 2b for visual comparison with different parameter combinations (Supplementary Fig. 2a).The impact of the imaging parameters in the resolution and signal ratio metrics in pySTED agree with the measurements that were performed on a real microscope.The small deviations can be explained by the variability that is typically observed in the determination of absolute values of fluorophore properties [39].
Next, we validated the photobleaching model that is implemented within pySTED.We calculated the photobleaching by comparing the fluorescence signal in a low-resolution image acquired before (CONF1) and after (CONF2) the high-resolution acquisition [6] (Methods).For the pixel dwelltime and the excitation power we measured similar trends between real and synthetic image acquisitions (Supplementary Fig. 3a).
For a confocal acquisition, the photobleaching in pySTED is assumed to be 0 (Supplementary Fig. 3a) as it is generally negligible in a real confocal acquisition.Considering the flexibility of pySTED, different photobleaching dynamics specifically tailored for any particular experiment can be implemented and added in the simulation platform.Examples of sequential acquisition (10 images) are presented in Supplementary Fig. 3b to demonstrate the effect of the imaging parameters on the photobleaching of the sample.pySTED also integrates background effects that can influence the quality of the acquired images as in real experiments [40,41] (Supplementary Fig. 3c,d).No significant changes in AP are measured for F-actin fibers but a significant increase is measured for N+S over O and N for F-actin rings (p-values in Supplementary Fig. 5).d) Images were progressively removed from the dataset (100%: 42 images, 75%: 31 images, 50%: 21 images, 25%: 10 images, and 10%: 4 images).Removing more than 50% of the dataset for fibers negatively impacts the models whereas removing 25% of the dataset negatively impacts the segmentation of rings (N, NS: non-significant; p-values in Supplementary Fig. 5).Adding synthetic images from pySTED during training allows 75% of the original training dataset to be removed without affecting the performance for both structures (N + S, p-values in Supplementary Fig. 5).
DL models are powerful tools to rapidly and efficiently analyse large databanks of images and perform various tasks such as cell segmentation [34,43].When no pretrained models are readily available online to solve the task [44], finetuning or training a DL model from scratch requires the tedious process of annotating a dataset.
We herein aim to reduce the required number of distinct images for training by using pySTED as an additional data augmentation step.As a benchmark, we used the F-actin segmentation task from Lavoie-Cardinal et al. [42], where the goal is to segment dendritic F-actin fibers or rings using a small dataset (42 images) of STED images (Figure 2a, Methods).pySTED was used first as a form of data augmentation to increase the number of images in the training dataset without requiring new annotations.Using U-Net datamap we generated Factin datamaps and a series of synthetic images in pySTED with various simulation parameters (Figure 2b, Supplementary Tab. 8).
We compared the segmentation performance by using the average precision (AP, Methods) of a DL model trained on the original dataset (O [42]) or with different image normalization and increased data augmentation (N ).The segmentation performance was not impacted by increasing the amount of data augmentation (O vs. N, Figure 2c).Adding synthetic images from pySTED (N+S) into the training dataset to improve the diversity of the dataset significantly increases the performance of F-actin rings segmentation compared to O and N and maintains the performance for the F-actin fibers segmentation (Figure 2c).In biological experiments, where each image is costly to acquire, reducing the size of the training dataset results in a higher number of images for the post-hoc analysis.Hence, we sought to measure the impact of reducing the number of real images in the training dataset by training on subsets of images that are augmented using pySTED (Supplementary Fig. 6).We measure a significant decrease of the AP for F-actin fibers when the model is trained on less than 50% of the images.Removing 25% of the dataset negatively impacts the segmentation performance of F-actin rings (Figure 2d, p-values in Supplementary Fig. 5).However, adding synthetic images from pySTED during training allows the segmentation performance of the model to be maintained by training with only 25% of the original dataset (Figure 2d, p-values in Supplementary Fig. 5).

Validation of AI methods
Benchmarking AI models for automating microscopy tasks on biological samples is challenging due to biological variability and the difficulty of comparing imaging strategies on the same region of interest [6,17,45].Assessing and comparing AI models requires multiple attempts in similar, yet different experimental the imaging parameters on the same sequence of datamaps (200 images).Two fluorophores are considered for demonstration purposes (Supplementary Tab.9).b) Resulting imaging objectives from LinTSDiag at 3 different timesteps (10 -cyan, 100 -grey, and 190 -red) for 50 independent models which are presented for increasing signal ratio (top to bottom).With time, LinTSDiag acquires images that have a higher preference score for both fluorophores (purple contour lines) and converges into a similar imaging objective space (red points).c) The standard deviation (STD) of the imaging objectives and of the preference scores decreases during the optimization (cyan to red) supporting the convergence of LinTSDiag in a specific region of the imaging objective space for both fluorophores.The dashed line separates the imaging objectives (R: Resolution, P: Photobleaching, and S: Signal ratio) from the preference network (PN).d) Typical pySTED simulations on two different fluorophores (top/bottom) using the optimized parameters on fluorophore A (left) or B (right).Parameters that were optimized for fluorophore A (top-left) result in higher photobleaching with maintaining a similar resolution and signal ratio on fluorophore B (bottom-left) compared to parameters that were optimized for fluorophore B (bottom-right).See Supplementary Tab. 9 for imaging parameters.e) Example acquisition of LinTSDiag on a Tubulin in kidney epithelial cells (Vero cells) stained with STAR RED in the beginning (left) and at the end of the optimization (right).f) Over time, LinTSDiag manages to increase both the resolution and the signal ratio of the acquired images (35 images, cyan to red).g) LinTSDiag allows multi-color imaging due to it's high dimensional parameter space capability.LinTSDiag optimizes the averaged resolution and signal ratio from both channels in dual-color images are acquired of Golgi (STAR ORANGE) and NPC (STAR RED) in Vero cells.h) LinTSDiag can maximize the signal ratio in the images while maintaining the resolution of the images (35 images, cyan to red).
conditions to limit the impact of biological variability.This inevitably increases the number of biological samples and the time required to develop robust AI-assisted adaptive microscopy strategies that can be deployed on a variety of samples and imaging conditions.pySTED allows the simulation of multiple versions of the same images as if the structure had been imaged with different experimental settings.We herein showcase the capability of pySTED in thoroughly validating ML approaches for the optimization of STED imaging parameters in a simulated controlled environment, enabling more robust performance assessments and comparisons.
We first demonstrate how pySTED can be used to characterize the performance of a multi-armed bandit optimization framework that uses Thompson Sampling (TS) for exploration, Kernel-TS.The application of Kernel-TS for the optimization of STED imaging parameters was demonstrated previously, but comparison between different experiments was challenging due to local variations in the size, brightness, and photostability of the fluorescently tagged neuronal structures [6].Using synthetic images generated with pySTED allows the performance of Kernel-TS to be evaluated on the same image sequence (50 repetitions, Methods) and with controlled photophysical properties of fluorophores (Extended Fig. 2 and Supplementary Tab.14).For experimental settings such as multi-channel imaging or adaptive scanning, Kernel-TS is limited by the number of parameters that can be simultaneously optimized (⇠4) in an online setting [6].We thus turned to a neural network implementation of Thompson Sampling which was recently developed to solve the multi-armed bandit framework, LinTSDiag [46].
Using pySTED we could characterize the performance of LinTSDiag on a microscopy optimization task on synthetic images without requiring real biological samples.As described above, LinTSDiag was trained on the same sequence (50 repetitions, Methods) using two different fluorophores (Figure 3a and Supplementary Tab.14).In a simple 3-parameters optimization setting, LinTSDiag allows a robust optimization of the signal ratio, photobleaching and spatial resolution for fluorophores with distinct photophysical properties (Figure 3b).We evaluate the performance of LinTSDiag using the preference score, which is obtained from a network that was trained to predict the preferences of an expert in the imaging objective space (PrefNet, see Methods) [6].The convergence of the agent in the imaging objective space is supported by the smaller standard deviation measured in the last iterations of the imaging session (red lines, Figure 3c).pySTED enables the comparison of the optimized parameters for different fluorophores on the same datamap.This experiment confirms that optimal parameters vary depending on the photophysical properties (Figure 3d).LinTSDiag was then deployed on a real microscopy system to simultaneously optimize 4 parameters (Exci-tation power, STED power, pixel dwell time, and linesteps) for the imaging of Tubulin stained with STAR RED in kidney epithelial cells (Vero cell line).The model was able to optimize the imaging objectives, improving the resolution and signal ratio, while maintaining a low level of photobleaching over the course of the optimization (Figure 3e,f and Supplementary Tab.14).Then we sought to increase the number of parameters by tackling a dual-color imaging scheme (6 parameters, Excitation power, STED power, and linesteps for both channels) for STED imaging of Golgi stained with STAR-ORANGE and nuclear pore complex (NPC) stained with STAR RED in Vero cells (Figure 3g,h and Supplementary Tab.14).The optimization framework allows 4 imaging objectives to be simultaneously optimized (e.g.resolution and signal ratio for both colors).As the visual selection of the trade-off in a 4-dimensional space is challenging for the user in an online setting, we decided to optimize the combined resolution and signal ratio of both fluorophores (average of the imaging objectives), allowing the users to indicate their preference in a two-dimensional objective space.Online 6-parameters optimization of LinTSDiag increases the signal ratio while maintaining a good image resolution for both imaging channels (Figure 3h) enabling to resolve both structures with sub-100 nm resolution.
Next, we developed a model that leverages prior information (context) to solve a task with a high-dimensional action space.This is the case for DyMIN microscopy which requires parameter selection to be adapted, in particular multiple illumination thresholds, to the current region of interest [8] (Figure 4a,b).We previously showed that contextual-bandit algorithms can use the confocal image as a context to improve DyMIN thresholds optimization in a two parameters setting [47].In this work we aim to increase the number of parameters (7 parameters) that can be simultaneously optimized and validate the robustness of LinTSDiag [46] (Figure 4b).We repeatedly trained LinTSDiag on the same datamap sequence using the confocal image as prior information (50 repetitions).The parameter selection was compared by measuring whether the action selection correlated over time between the models (Figure 4c, Supplementary Fig. 7, Supplementary Tab.14, and Methods).For instance, the correlation matrix from the last 10 images shows clusters of similar parameters that are better defined than for the first 10 images (Figure 4c).This is confirmed by the 90 th and 10 th quantile difference in the correlation matrix which rapidly increases with time (Figure 4d).As expected with clustered policies, the average standard deviation of the action selection for each cluster reduces over time implying similar parameter selection by the models (Figure 4e).We also assessed whether the models would adapt their policies to different fluorophores (light/dark purple, Figure 4c,f).As shown in Figure 4f, there are specific policies for each fluorophore (e.g.fluorophore A: 0, 3; fluorophore B: 5) demonstrating the capability of the models in adapting their parameter selections to the experimental condition.While the policy of the models are different, the measured imaging objectives are similar for all clusters (Figure 4g) which suggests that different policies can solve this task unveiling the delicate intricacies of DyMIN microscopy.
More importantly this shows that the model can learn one of the many possible solutions to optimize the imaging task.
The LinTSDiag optimization strategy was deployed in a real life experiment for the 7 parameter optimization of DyMIN3D imaging of the post-synaptic protein PSD95 in dissociated primary hippocampal neurons stained with STAR-635P.Early in the optimization, the selected parameters produced images with poor resolution or missing structures (artefacts) (Figure 4h and Supplementary Tab.14).The final images were of higher quality (right, Figure 4h) with fewer artefacts and high resolution.The parameter selection of the model converged in a region of the parameter space that could improve all imaging objectives over the course of optimization (Figure 4i,j).Parameters optimized with LinTSDiag allowed a significant improvement of DyMIN3D imaging of PSD95 compared to conventional 3D STED imaging (Supplementary Fig. 8).pySTED allowed us to validate the robustness of the model in a simulated environment prior to its deployment in a real experimental setting.

Learning through interactions with the system
Online optimization strategies such as Kernel-TS and LinTSDiag were trained from scratch on a new sample, implying a learning phase in which only a fraction of the images will meet minimal image quality requirements.
For costly biological samples, there is a need to deploy algorithms that can make decisions based on the environment with a reduced initial exploration phase.Control tasks and sequential planning are particularly well suited for a RL framework where an agent (e.g.replacing the microscopist) learns to make decisions millions of examples to learn a single task [27,29].This makes them less attractive to be trained on realworld tasks where each sample can be laborious to obtain (e.g.biological samples) or when unsuitable actions can lead to permanent damage (e.g.overexposition of the photon detector).Simulation platforms are thus essential in RL to provide environments in which an agent can be trained at low cost to then be deployed in a real-life scenario [49], which is referred to as simulation to reality (Sim2Real) in robotics.
Here, pySTED is used as a simulation software to train RL agents.We implemented pySTED in an OpenAI Gym environment (gym-STED) to facilitate the deployment and development of RL strategies for STED microscopy [25,50].To highlight the potential of gym-STED to train a RL agent, we crafted the task of resolving nanostructures in simulated datamaps of various neuronal structures (Figure 5a).In gym-STED an episode unfolds as follows.At each timestep the agent observes the state of the sample: a visual input (image) and the current history (Methods, Figure 5b).The agent then performs an action (adjusting pixel dwell time, excitation and depletion powers), receives a reward based on the imaging objectives and transitions into the next state.A single value reward is calculated using a preference network that was trained to rank the imaging objectives (resolution, photobleaching and signal ratio) according to expert preferences [6] (Methods).A negative reward is obtained when the selected parameters lead to a high photon count that would be detrimental to the detector in real experimental settings (e.g.non-linear detection of photon counts).This sequence is repeated until the end of the episode, 30 timesteps.In each episode, the goal of the agent is to balance between detecting the current sample configuration and acquiring high-quality images to maximize it's reward (Figure 5a).We trained a proximal policy optimization (PPO) [51] agent and evaluate its performance on diverse fluorophores (Methods).Domain randomization is used heavily within the simulation platform to cover a wide variety of fluorophores and structures and thus increase the generalization properties of the agent [52].In Figure 5c-f (Supplementary Tab.15), we report the performance of the agent on a fluorophore with simulated photophysical properties that would result in high brightness (high signal ratio) and high photostability (low photobleaching) in real experiments.The results of the agent on other simulated fluorophore properties are reported in supplementary material (Supplementary Tab. 10 and Supplementary Fig. 9).Over the course of training, the agent adapts its policy to optimize the imaging objectives (100k and 12M training steps, Figure 5c).As expected from RL training, the reward of an agent during an episode is greater at the end of training compared to the beginning (red vs. cyan, Figure 5d).
When evaluated on a new sequence, the agent trained over 12M steps rapidly adapts its parameter selection during the episode to acquire images with high resolution and signal ratio, while minimizing photobleaching (Figure 5e,f).The agent shows a similar behavior for various simulated fluorophores (Supplementary Fig. 9).
We compared the number of good images acquired by the RL agent with that of bandit optimization for the first 30 images of the optimization.In similar experimental conditions, with the same fluorophore and parameter search space, the average of the number of good images were (18 ± 3) and (5 ± 3) for the RL agent and bandit respectively (50 repetitions).This almost four-fold increase in the number of high quality images, highlights the improved efficiency of the RL agent at suggesting optimal imaging parameters.4) which then selects an action, i.e. the next imaging parameters (5).A STED image and a second confocal image are generated in pySTED (6).The imaging objectives and the reward are calculated (7).On the next timestep, the agent sees a new ROI, the previously simulated images and the history of the episode.b) The state of the agent includes a visual input (the images) and the history.The visual input of the agent is the current confocal (CONFt) and the previous confocal/STED images (CONFt 1 and STEDt 1).The state of the agent also incorporates the laser excitation power at which the confocal image was acquired (c), the history of selected actions (at) and the calculated imaging objectives (Ot).The history vector is zero-padded to a fixed length (h0i).The agent encodes the visual information using a convolutional neural network (CNN) and the history using a fully connected linear layer (LN).Given the capability of the agent in acquiring images for a wide variety of synthetic imaging sequences, we evaluated if the agent could be deployed in a real experimental setting.The experimental conditions chosen for the simulations were based on the parameter range available on the real microscope.Dissociated primary hippocampal neurons were stained for various neuronal proteins (Figure 6, Extended Fig. 3, and Supplementary Tab.15) and imaged on a STED microscope with the RL agent assistance for parameter choice.First, we evaluated the performance of our approach for Sim2Real on in distribution images from F-actin and CaMKII-in fixed neurons.While simulated images of both structures were available within the training environment we wanted to evaluate if the agents could adapt to the real life imaging settings (Supplementary Fig. 1).As shown in Figure 6a and Extended Figure 3, the agent resolves the nanoorganization of both proteins (Supplementary Fig. 10).We sought to confirm whether the quality of the images was sufficient to extract biologically relevant features (Methods).For both proteins, the measured quantitative features matched with values previously reported in the literature, enabling the resolution of the 190 nm periodicity of the F-actin lattice in axons, and the size distribution of CaMKII-nanoclusters [53,54,56] (Figure 6a and and Extended Figure 3).Next, we wanted to validate that the agent would adapt it's parameter selection to structures, fluorophores properties or imaging conditions that were not included in the training set.We first observed that the agent could adapt to a very bright fluorescent signal and adjust the parameters to limit the photon counts on the detector (Extended Fig. 3).The morphology of the imaged PSD95 nano-cluster was in agreement with the values reported by Nair et al. [57] (Extended Fig. 3).We deployed the RL-based optimization scheme for the imaging of the mitochondrial protein TOM20 to evaluate the ability of the agent to adapt to out-of-distribution structures (Figure 6b).The nano-organization and morphology previously described by Wurm et al. [55] of TOM20 in punctate structures is revealed using the provided imaging parameters in all acquired images (Figure 6b and Supplementary Fig. 10).Next, we evaluated the generalizability of the approach to a new imaging context, which is live-cell imaging.We used the optimization strategy for the imaging of the F-actin periodic lattice in living neurons (Figure 6c).
The quality of the acquired images are confirmed by the quantitative measurement of the periodicity which matches the previously reported values of 190 nm from the literature [53,54].Finally, we verified the generalizability of our approach by deploying our RL-assisted strategy on a new microscope and samples (Figure 6d-e, Extended Fig. 4, and Supplementary Tab.15).The agent successfully adapted to the new imaging conditions, rapidly acquiring high quality images of the fluorescently-tagged proteins, even in high photobleaching condition such as with STED microscopy of a green emitting fluorophore.Using the pySTED simulation environment we could successfully train RL-agents that can be deployed in a variety of real experimental settings to tackle STED imaging parameter optimization tasks.

Discussion
We built pySTED, an in-silico super-resolution STED environment, which can be used to develop and benchmark AI-assisted STED microscopy.Throughout synthetic and real experiments, we have demonstrated that it can be used for the development and benchmarking of AI approaches in optical nanoscopy.The Google Colab notebook that was created as part of this work can be used by microscopist trainees to develop their skills and intuition for STED microscopy before using the microscope for the first time.
The simulation platform was built to be versatile and modular.This allows the users to create and test the efficiency of AI-strategies and adaptive imaging scheme before deploying them on a real microscope.For instance, both DyMIN [8] and RESCue [58] microscopy are readily available to the users.Additionally, the community can contribute open-source modules that would meet their experimental settings.
Smart-microscopy requires that tools and modules be built to increase the capabilities of the microscopes [10,59] which can be challenging when working on a real microscopy system.The development of simulation software is one way to mitigate the difficulty of building an AI-assisted microscopy setup.We designed specific experiments to demonstrate how pySTED can be used to answer this problem in STED microscopy.Indeed, developing and validating AI methods in a simulation environment greatly facilitates both.We mainly focused on the selection of imaging parameters which is one branch of AI-assisted microscopy but also showed that pySTED can be successfully applied to data augmentation in supervised learning settings.A recent trend in microscopy focuses on the implementation of data-driven microscopy systems.For example, systems are built to automatically select informative regions or improve the quality of the acquired images [60,61].The development and validation of such data-driven systems could be achieved with pySTED.
We also tackle the training of an RL agent, which would be impossible without the access to a large databank of simulated data.The RL agent enables a full automatization of the imaging parameter selection in deployed in gym-STED, an OpenAI gym environment built around pySTED [50].Domain randomization was used heavily within the simulation platform [52] which resulted in a RL agent that could adapt its parameter selection to a wide variety of experimental conditions.Such strategies could be transformative to democratize STED microscopy to a larger diversity of experimental settings and allow non-expert users to acquire high-quality images on a new sample without previous optimization sessions.
While RL agents can represent a powerful tool to automatize microscopy setups, they must be trained on a very large number of examples (e.g.12M steps in this work) [27,29], which would be infeasible on a real microscopy setup.The pySTED simulation environment allowed the RL agent to bridge the gap between simulation and reality without requiring any fine-tuning.This makes pySTED an appealing platform for RL development as it is particularly well suited for complex control tasks requiring temporally distinct trade-offs to be made.In this work, the model relied on a constant preference function to convert the multi-objective optimization into a single reward function.This preference function is ultimately user-dependant.This could be complemented in the future by incorporating RL from human-feedback in the training of the RL model [62,63].In future work, temporal dynamics could also be implemented in pySTED to open new possibilities to fully automatize the selection of informative regions and of imaging parameters in an evolving environment.6: Bridging the reality gap between simulation and reality in RL by pretraining with pySTED.For all real microscopy experiments, the deployed agent was trained over 12M steps in simulation.The agent was deployed on a real STED microscope for the imaging of diverse proteins in dissociated neuronal cultures and cultivated Vero cells.a) Top: Simulated images of F-actin in fixed neurons were used during the training process.Deploying the RL agent to acquire an image of this in distribution structure in a real experiment allows the periodic lattice of F-actin tagged with Phalloidin-STAR635 to be revealed in all acquired images.Bottom: Structural parameters are extracted from the acquired images (the dashed vertical line represents the median of the distribution) and compared to the values that were previously reported in the literature (solid vertical line).The agent has learned to adjust the imaging parameters to resolve the 190 nm periodicity of the F-actin periodic lattice [53,54]).b) Top: The trained agent is tested on the protein TOM20, a structure that was never seen during training (out of distribution).The nano-organization of TOM20 is revealed in all acquired images.Bottom: The measured average cluster diameter of TOM20 concords with the averaged reported values from Wurm et al. [55].c) Top: Live-cell imaging of SiR-Actin shows the capacity of the model in adapting to different experimental conditions (out of distribution).Bottom: The periodicity of the F-actin periodic lattice is measured from each acquired images and compared with the literature.See Material and Methods for the quantification.The STED images are normalized to their respective confocal image (CONF1).The second confocal image (CONF2) uses the same colorscale as CONF1 to reveal photobleaching effects.d,e) Images acquired by the RL agent in a real experiment on a different microscope.Tubulin was stained with the STAR-RED fluorophore (d) and Actin was stained with STAR-GREEN (e)in fixed Vero cells.The sequence of acquired images goes from top left to bottom right.The confocal images before (CONF1) and after (CONF2) are presented for photobleaching comparison.The CONF1 image is normalized to the CONF2 image.The STED images are normalized to the 99 th percentile of the intensity of the CONF1 image.Images are 5.12 µm ⇥ 5.12 µm.The evolution of the parameter selection (left; STED: STED power, Exc.: Excitation power, Pdt.: Pixel dwelltime) and imaging objectives (right; R: Resolution, P: Photobleaching, S: Signal ratio) are presented, showing that optimal parameters and optimized objectives for Far-red (d) and Vis-STED (e) can differ greatly.

pySTED simulation platform
Two main software implementations are incorporated within the pySTED simulation platform: i) point spread functions (PSF) calculation, and ii) emitter-light interactions.
PSF calculation PSF calculation in pySTED is inspired by previous theoretical work from Leutenegger et al. [1] and Xie et al. [2] (Figure 1b).As in Xie et al. [2], we calculate the excitation and depletion PSF by using the electric field (Figure 1b).The Effective PSF (E-PSF) is calculated by combining the excitation, depletion and detection PSFs using where R is the radius of the imaged aperture [3] and ⇣ is the saturation factor of the depletion defined as ⇣ = I STED Is with I s being the saturation intensity [1].The left-hand side of equation 1 represents the probability that an emitter at position r contributes to the signal [4] and is calculated in pySTED using ⌘p exc with where q fl is the quantum yield of the fluorophore, abs the absorption cross-section, exc the photon flux from the excitation laser and ⌧ STED the period of the STED laser.The ⌘ parameter allows the excitation probability to be modulated with the depletion laser or allows time-gating to be considered during the acquisition [1,5].Time-gating consists in activating the detector within a small window of time (T g , typically 8 ns) after the excitation pulse (T del , typically 750 ps) to prominently detect photons coming from spontaneous emission.The simulations performed with pySTED follow the scheme of pulsed-STED microscopy in which time-gating mostly reduces correlated background [5].Following the derivation from Leutenegger et al. [1] and assuming that T g ⌧ STED , the emission probability of a fluorophore is described as where k S1 is the spontaneous decay rate, is the effective saturation factor = ⇣k vib (⇣kS1+k vib ) with k vib the vibrational relaxation state of S 0 0 and t STED is the STED pulse width (Figure 1c,d).In the confocal case (I STED = 0), the emission probability simply reduces to where T is the period between each STED pulses.This allows the probability of spontaneous decay ⌘ to be calculated using F ( )/F (0).The calculated E-PSF is convolved on the datamap to simulate the photons that are emitted and the one measured by the detector.
In real experiments, the number of detected photons is affected by several factors (e.g.photon detection and collection efficiency of the detector, the detection PSF, the fluorophore brightness, etc.), which were also integrated in the pySTED simulation environment (Supplementary Tab 1-5).We also included the possibility to add typical sources of noise that occur in a targeted microscopy experiment such as shot noise, dark noise, and background noise which are all modeled by Poisson processes (Supplementary Tab. 3).
Emitter-light interactions In a real microscopy experiment, the emitters can be degraded as they interact with the excitation or depletion light.Photobleaching is the process by which an emitter becomes inactive following light exposure [6].In STED microscopy, this process is mainly caused by the combination of the excitation and depletion laser beam [6].Reducing photobleaching is an objective that the microscopist has to target during an imaging session and that must be minimized to preserve sample health and sufficient imaging contrast.Hence, we implemented a realistic photobleaching model within the pySTED simulation software.The photobleaching model is based on the derivations from Oracz et al. [6] which were validated on real samples.Figure 1d presents the energy states, the decay rates, and the photobleaching state that are used within the photobleaching model.
As in Oracz et al. [6], we define the photobleaching rate as where k 0 , k 1 and b are dependant on the fluorophore and have to be determined experimentally.In the default parameters of pySTED we assume that the linear photobleaching term is null (k 0 =0) and that photobleaching occurs only from S1 during the STED pulse.Other photobleaching parameters could be easily integrated considering the modular structure of pySTED.We define the effective photobleaching rate k b as the number of emitters transitioning from the S1 state to the photobleached state (P ) over the course of a laser period with In pySTED the number of emitters N in a pixel is updated by calculating their survival probability p = exp ( k b t) from a Binomial distribution for a given dwell time t (Figure 1e).While most parameters can be obtained from the literature for a specific fluorophore, some parameters such as k 1 and b need to be determined experimentally [6].Given some experimental data (or a priori about the expected photobleaching of a sample) we can estimate the photobleaching properties (k 1 and b) of a fluorophore with by using non-linear least-squares methods.We can also apply a similar process to estimate the absorption cross-section ( abs ) of a fluorophore to optimize the confocal signal intensity to an expected value Oracz et al. [6].

Realistic datamaps
A realistic datamap, that can be used in pySTED, is generated by predicting the position of emitters in a real super-resolved image.A U-Net model (U-Net datamap , implemented in PyTorch [7]) is trained to predict the underlying structure of a super-resolved image.U-Net datamap has a depth of 4 with 64 filters in the first double convolution layer.Padding was used for each convolution layer to keep the same image size.
As in the seminal implementation of the U-Net [8], maxpool with a kernel and stride of 2 was used.The number of filters in the double convolution layers doubled at each depth.In the encoder part of the model, each convolution is followed by batch normalization and a Rectified Linear Unit (ReLU).Upsampling is performed using transposed convolution.The decoder part of the model uses double convolution layers as in the encoder part of the model.At each depth of the model, features from the encoder are propagated using skipping links and concatenated with the features obtained following the upsampling layer.A last convolution layer is used to obtain a single image followed by a sigmoid layer.
As previously mentioned, the goal of the U-Net is to predict the underlying structure of super-resolved images.
Training U-Net datamap in a fully-supervised manner requires a training dataset of associated super-resolved images and underlying structures.However, such a dataset does not currently exist.Mathematically, a microscopy image is obtained from the convolution of the microscope E-PSF with the position of fluorophores at the sample.In the images from Durand et al. [9], the E-PSF of the microscope can be approximated by a Gaussian function with a full width at half maximum of ⇠ 50 nm.Hence, U-Net datamap can be trained to predict the datamap that once convolved with the E-PSF will be similar to the input image (Figure 1f).
The L 2 error is calculated between the Gaussian convolved datamap and the original input image as the loss function to minimize.
To train the model we used good quality STED images of diverse neuronal proteins from an existing dataset [9] (quality > 0.7).Supplementary Tab. 7 presents the proteins imaged and the number of images that were used for training.Each 224 ⇥ 224 pixels image is augmented with three (3) 90°rotations.The Adam optimizer scratch after each acquisition.However, given the high dimensionality of the parameter space, this may lead to high variability in the proposed parameter combination.To reduce this variability, Deb et al. [22] proposed to keep a fraction of the previous options as a warm start of the NSGA-II search.In this work, 30% of the previous options are randomly sampled and used as starting points for the next NSGA-II search.
The resulting Pareto front of imaging objectives is shown to the preference articulation method.

Contextual LinTSDiag
The contextual version of LinTSDiag heavily relies on the implementation of LinTSDiag described above.In this work, the contextual information was used to solve a DyMIN microscopy task.As previously mentioned the confocal image serves as contextual information, but any other contextual information pertinent to the task could be provided to the model.The confocal image is encoded with a 2 layers convolutional neural network.A first convolution layer with 8 filters, kernel size of 3 and padding of 1 is followed by a batch normalization layer, maxpooling layer (size 4; stride 4) and ReLU activation.A second convolution layer of 16 filters is followed by a batch normalization layer.Global average pooling is used to generate a vector embedding.This is followed by a ReLU activation and dropout layer with a probability of 0.2.The embedding is projected to 32 features using a fully connected layer and is followed by a ReLU activation and a dropout layer with a probability of 0.2.The contextual features are concatenated with the parameter features (described in LinTSDiag).A single-layer fully-connected model with a hidden size of 64 is used to predict the imaging objectives.ReLU activation is used at the hidden layer.A single contextual encoder is created and shared between the imaging objectives.The same training procedure and NSGA-II search are used as in LinTSDiag.

Preference articulation
The optimization algorithms output possible trade-offs between the imaging objectives.The preference articulation step consists in selecting the trade-off that is the most relevant for the task.Two preference articulation methods were used in the bandit optimization: manual selection and automatic selection [9].
Manual selection This method requests a manual input from the microscopist at each image acquisition.
The microscopist is asked to select the trade-off that is inline with their own preferences from the available options (point cloud).This method was used in all experiments on the real microscope using the bandit optimization scheme (Figure 3e-h and Figure 4h-j).
Automatic selection This method aims at reducing the number of interventions from the microscopist in the optimization loop by learning their preferences prior to the optimization session.In Durand et al.

Reinforcement learning experiments
A RL agent interacts with an environment by sequentially making decisions based on its observations.The goal of the agent is to maximize its reward signal over the course of an episode. is provided to the agent as feedback.The goal of the agent is to maximize the cumulative reward over the trajectory ⌧ = (s t , a t , s t+1 , a t+1 , ...).Formally, the cumulative reward may be written in the form of

RL formulation
where is a discount factor in the range [0, 1] to temporally weight the reward.Intuitively, using a discount factor close to 1 implies that the credit assignment of the current action is important for future reward, which is the case for long planning horizon, while a discount factor close to 0 reduces the impact of temporally distant rewards [23].

Reward function
The optimization of super-resolution STED microscopy is a multi-objective problem (e.g.Resolution, Signal Ratio, and Photobleaching).However, the conventional RL settings and algorithms assume the access to a reward function that is single-valued, in other words a single-objective optimization [23].Several methods were introduced to solve the multi-objective RL setting, for instance by simultaneously learning multiple policies or by using a scalarisation function (see Hayes et al. [24] for a comprehensive review).The scalarisation function is simple to implement and allows all of the algorithms that were developed for RL to be used, but assumes that the preference from the user are known a priori.
In this work, the multi-objective RL setting was transformed into a single scalar reward by using the neural network model, PrefNet [9], that was developed in the bandit experiments.Indeed, the PrefNet model was trained to reproduce the trade-off that an expert is willing to make into the imaging objective space.The PrefNet model does so by assigning a value to a combination of imaging objectives.The values predicted by the model for a combination of objectives are arbitrary but the ranking of these values is accurate.Hence, the values from the PrefNet model is proportional to the image quality.The reward of the agent can then be defined using equation 16.For safety precautions when deploying the agent on a real microscopy system, the agent incurs a reward of -10 when the frequency of photons on the detector is higher than 20 MHz.r t = ( 10 when f photons > 20 MHz PrefNet(R, P, S) otherwise (16) While the negative reward can be used to limit the selection of actions that could damage the microscope, it is not required.For instance, the results showed in Figure 6d-e and Extended Fig. 4 used a version of the reward function that did not include the negative reward.It is worth noting that in these cases, the range of parameters should be carefully selected to avoid damages to the microscope.
Agent The Proximal Policy Optimization (PPO) model [25] was used for all RL experiments.PPO is considered state-of-the-art for many control tasks, and is widely used in robotics [26].PPO allows continuous action space making it suitable for the task of microscopy parameter tuning.It is an on-policy algorithm meaning that the same policy is used during the data collection and the updating phases.The model uses a deep neural network to map the state to the actions.Since PPO is an actor-critic method, it simultaneously

Neuronal cell culture
Neuronal cultures from the hippocampus were obtained using neonatal Sprague Dawley rats, adhering to the animal care guidelines set by Université Laval.The rats, aged P0-P1, were euthanized through decapitation before the hippocampi were dissected.The cells were then seeded onto 12 and 18 mm coverslips coated with poly-d-lysine and laminin, for fixed (12 mm, 40,000/coverslip) and live-cell (18 mm, 100,000 cells/coverslip) STED imaging.Neurons were cultivated in a growth medium composed of Neurobasal and B27 (in a 50:1 ratio), enriched with penicillin/streptomycin (25 U/mL; 25 µg/mL) and 0.5 mM L-GlutaMAX (by Invitrogen).Ara-C (5 µM; from Sigma-Aldrich) was added into the medium after five days to limit the proliferation of non-neuronal cells.Twice a week, ⇠50% of the growth medium was replaced with serumand Ara-C-free medium.Cells were used between Days In Vitro (DIV) 12-16 for experiments.
Sample preparation and staining procedures Fixation was performed for 10 minutes in 4% PFA solution (PFA 4%, Sucrose 4%, Phosphate Buffer 100mM, Na-EGTA 2mM).Neurons were permeabilized with 0.1% Triton X-100 and aspecific binding sites were blocked for 30 min with PBS 20 mM and 2% goat serum.Primary and secondary antibodies were successively incubated for 2h and 1h respectively.
Phalloidin was incubated for 1h.All incubations were done at room temperature, in the blocking solution.

Quantification of biological structures
F-actin Line profiles of ⇠1 µm were manually extracted from each image.A linewidth of 3 pixels was used to average the profile values.The autocorrelation function (statsmodels library [30]) was calculated from the intensity profile.The length of periodicity of the signal was determined from the first peak maxima.

CaMKII-and PSD95
We segmented clusters using a wavelet segmentation using the implementation from Wiesner et al. [31].The scales used were (1, 2) for STED and (3, 4) for confocal segmentation.A threshold of 200 was used.Small segmentation objects (<3 pixels) were removed and small holes (<6 pixels) were filled.In the STED image segmentation, only the objects part of the confocal foreground were considered.For STED segmentation, watershed was used to split merged segmented objects.The local peak maximum were used as initial seeds.Small segmentation objects (<3 pixels) resulting from the watershed split were filtered out.The properties of each segmented object was extracted using regionprops from the scikit-image python library [32].
TOM20 A similar approach to the one in Wurm et al. [33] was used.Briefly, the confocal foreground of each mitochondrion was extracted using the same wavelet segmentation procedure as for CaMKII-and PSD95.The 2D autocorrelation on square crops of 320 nm x 320 nm centered on each mitochondrion were calculated.The diameter of TOM20 cluster is defined as the standard deviation obtained from a 2D Gaussian curve fit of the autocorrelation profile.See Supplementary Tab. 9 for imaging parameters.

Figure 1 :
Figure1: pySTED simulation platform.a) Schematic of the pySTED microscopy simulation platform.The user specifies the fluorophore properties (e.g.brightness and photobleaching) and the positions of the emitters in the datamap.A simulation is built from several components (excitation and depletion lasers, detector, and objective lens) that can be configured by the user according to their experimental settings.A low-resolution (Conf) or highresolution (STED) image of an underlying datamap is simulated using the provided imaging parameters.The number of fluorophores on each pixel in the original datamap is updated according to their photophysical properties and associated photobleaching effects.b) Modulating the excitation with the depletion beam impacts the effective point E-PSF of the microscope.The E-PSF is convolved on the datamap to calculate the number of photons.c) A timegating module is implemented in pySTED.The temporal acquisition scheme of the simulation can be modulated by the user.It affects the lasers and the detection unit.The time-gating parameters of the simulation (gating delay: T del and gating time: Tg) as well as the repetition rate of the lasers (⌧rep) are presented.A grey box is used to indicate when a component is active.d) A two state Jablonski diagram (ground state: S0 and excited state: S1) presents the transitions that are included in the fluorescence (spontaneous decay: kS 1 and stimulated emission decay: kSTED) and photobleaching dynamics (photobleaching rate: k b and photobleached state: ) of pySTED.The vibrational relaxation rate (1/⌧ vib ) affects the effective saturation factor in STED.e) An image acquisition is simulated as a two-step process where for each position in the datamap we do the following : i, Acquire) The weighted sum of the E-PSF with the number of emitters in the datamap (Datamap -Emitters) is calculated to obtain the signal intensity and is reported in the image (Image -Photons).ii, Photobleaching) The number of emitters at each position in the datamap is updated according to the photobleaching probability (line profile from k b , compare top and bottom line).The same colormaps used in a are also employed for both the datamap and image in e and f. f) Realistic datamaps are generated from real images.A U-Net model is trained to predict the underlying structure from a real STED image.Convolving the predicted datamap with the approximated PSF results in a realistic synthetic image.During training the mean squared error loss (MSELoss) is calculated between the real and synthetic image.Once trained, the convolution step can be replaced by pySTED.

2. 2
pySTED as a development platform for AI-assisted microscopy 2.2.1 Dataset augmentation for training deep learning models 0.75 0.50 0.25 0.10 Ratio of training dataset

Figure 2 :
Figure 2: pySTED is used to artificially augment the training dataset of a DL model.a) We tackle the segmentation task that was used in Lavoie-Cardinal et al.[42] where the annotations consist in polygonal bounding boxes around F-actin fibers (magenta) and rings (green).b) pySTED is used to augment the training dataset by generating synthetic versions of a STED image.c) Average Precision (AP) of the model for the segmentation of F-actin fibers (magenta) and rings (green).The model was trained on the original dataset from Lavoie-Cardinal et al.[42] (O), and on the same dataset with updated normalization (N) and additionnal synthetic images (N+S).No significant changes in AP are measured for F-actin fibers but a significant increase is measured for N+S over O and N for F-actin rings (p-values in Supplementary Fig.5).d) Images were progressively removed from the dataset (100%: 42 images, 75%: 31 images, 50%: 21 images, 25%: 10 images, and 10%: 4 images).Removing more than 50% of the dataset for fibers negatively impacts the models whereas removing 25% of the dataset negatively impacts the segmentation of rings (N, NS: non-significant; p-values in Supplementary Fig.5).Adding synthetic images from pySTED during training allows 75% of the original training dataset to be removed without affecting the performance for both structures (N + S, Photobleaching (%) Photobleaching (%) Photobleaching (%)

Figure 3 :
Figure 3 : Caption is on the next page

Figure 3 :
Figure3: Validation of AI-assisted algorithms with pySTED for STED microscopy parameter optimization.a) pySTED is used to confirm the robustness of a model to the random initialization by repeatedly optimizing (50 repetitions) the imaging parameters on the same sequence of datamaps (200 images).Two fluorophores are considered for demonstration purposes (Supplementary Tab.9).b) Resulting imaging objectives from LinTSDiag at 3 different timesteps (10 -cyan, 100 -grey, and 190 -red) for 50 independent models which are presented for increasing signal ratio (top to bottom).With time, LinTSDiag acquires images that have a higher preference score for both fluorophores (purple contour lines) and converges into a similar imaging objective space (red points).c) The standard deviation (STD) of the imaging objectives and of the preference scores decreases during the optimization (cyan to red) supporting the convergence of LinTSDiag in a specific region of the imaging objective space for both fluorophores.The dashed line separates the imaging objectives (R: Resolution, P: Photobleaching, and S: Signal ratio) from the preference network (PN).d) Typical pySTED simulations on two different fluorophores (top/bottom) using the optimized parameters on fluorophore A (left) or B (right).Parameters that were optimized for fluorophore A (top-left) result in higher photobleaching with maintaining a similar resolution and signal ratio on fluorophore B (bottom-left) compared to parameters that were optimized for fluorophore B (bottom-right).See Supplementary Tab. 9 for imaging parameters.e) Example acquisition of LinTSDiag on a Tubulin in kidney epithelial cells (Vero cells) stained with STAR RED in the beginning (left) and at the end of the optimization (right).f) Over time, LinTSDiag manages to increase both the resolution and the signal ratio of the acquired images (35 images, cyan to red).g) LinTSDiag allows multi-color imaging due to it's high dimensional parameter space capability.LinTSDiag optimizes the averaged resolution and signal ratio from both channels in dual-color images are acquired of Golgi (STAR ORANGE) and NPC (STAR RED) in Vero cells.h) LinTSDiag can maximize the signal ratio in the images while maintaining the resolution of the images (35 images, cyan to red).

Figure 4 :
Figure4: Validation of contextual-bandit algorithms with pySTED in a high-dimensional parameter space.a) DyMIN microscopy uses thresholds to turn off the high-intensity STED laser when no structures are within the vicinity of the donut beam (white regions).Thus limiting the light doses at the sample compared to conventional STED.b) Typically DyMIN uses a 3 step process at each pixel.In the first step, only the excitation (Exc.)laser is used and the signal is measured.If the measured signal is higher than the predefined threshold (Threshold 1) after the decision time (Decision Time 1) then the depletion power (STED) is slightly increased and the signal is measured again (Threshold 2 and Decision time 2).Otherwise the acquisition is stopped until the next pixel.The final step (Step 3) consists in a normal STED acquisition.c) pySTED was used to characterize LinTSDiag models that can simultaneously optimize 7 parameters (STED and excitation powers, pixel dwelltime, threshold 1 & 2, and decision time 1 & 2) with prior information about the task (confocal image).The convergence of the models to similar parameter combinations is evaluated by measuring the correlation in the action selection (50 models) over time (See Supplementary Fig.7).Clustering of the correlation matrix reveals clusters of policies that are better defined later in the optimization process (right dendrogram, color-coded).The shades of purple on the left of the correlation matrix represent two different fluorophores (light: A, dark: B). d) The difference between the 90 th and 10 th quantile of the correlation matrix increases with time implying better defined clusters of policies.e) The intra cluster standard deviation (STD) of the parameter selection decreases during the optimization showing that the policy of the models converges in all defined clusters.f) The proportion of models per cluster for fluorophore A or B (light and dark respectively) shows that there are different modes of attraction in the parameter space for fluorophores with distinct photophysical properties (color-code from c). g) While models converged in different regions of the parameter space, the measured imaging objectives (R: Resolution, A: Artefact, P: Photobleaching) are similar for each cluster (colorcode from c). h) Example acquisition with LinTSDiag optimization on a real acquisition task for DyMIN3D of the synaptic protein PSD95 in cultured hippocampal neurons.The volume size is 2.88 µm ⇥ 2.88 µm ⇥ 2 µm.Confocal (left) and DyMIN (right) acquisitions are displayed.i) A convergence of the parameter selection in the 7-parameter space is observed (cyan to red, STED: STED power, Exc.: Excitation power, Pdt.: Pixel dwelltime, Th1-2: First and second DyMIN threshold, and T1-2: First and second DyMIN decision time).j) LinTSDiag optimization reduces the variability of all imaging objectives during the optimization (50 images).Boxplot shows the distribution in bins of 10 images.

Figure 5 :
Figure 5: A RL agent is trained to optimize the STED imaging parameters in simulation with pySTED.a) Schematic of the RL training loop in simulation.Each episode starts by sampling a set of photophysical properties representing a fluorophore (1) and the selection of a structural protein from the databank (2).At each timestep a region of interest (ROI) is selected: a datamap is created and a confocal image is generated with pySTED (3).The confocal image is used in the state of the agent (4) which then selects an action, i.e. the next imaging parameters (5).A STED image and a second confocal image are generated in pySTED (6).The imaging objectives and the reward are calculated (7).On the next timestep, the agent sees a new ROI, the previously simulated images and the history of the episode.b) The state of the agent includes a visual input (the images) and the history.The visual input of the agent is the current confocal (CONFt) and the previous confocal/STED images (CONFt 1 and STEDt 1).The state of the agent also incorporates the laser excitation power at which the confocal image was acquired (c), the history of selected actions (at) and the calculated imaging objectives (Ot).The history vector is zero-padded to a fixed length (h0i).The agent encodes the visual information using a convolutional neural network (CNN) and the history using a fully connected linear layer (LN).Both encoding are concatenated and fed to a LN model which predicts the next action.c) Evolution of the policy (left, STED: STED power, Exc.: Excitation power, Pdt.: Pixel dwell time) and imaging objectives (right, R: Resolution, P: Photobleaching, S: Signal ratio) for a fluorophore with high-signal and low-photobleaching properties during training at the beginning (cyan, 100k timesteps) and at the end (red, 12M timesteps) of the training process.A boxplot shows the distribution of the average value from the last 10 images of an episode (30 repetitions).d) Evolution of the reward during an episode at the beginning (cyan, 100k timesteps) and at the end of training (red, 12M timesteps) for the same fluorophore properties as in c).e) Evolution of the policy (left) and imaging objectives (right) after training (12M timesteps) during an episode for a fluorophore with the same photophysical properties as in c).f) Typical examples of images acquired during an episode.The image index is shown in the top right corner and the calculated imaging objectives in the top left corner.The STED image and second confocal (CONF2) image are normalized to their respective first confocal (CONF1) images.
Figure 5: A RL agent is trained to optimize the STED imaging parameters in simulation with pySTED.a) Schematic of the RL training loop in simulation.Each episode starts by sampling a set of photophysical properties representing a fluorophore (1) and the selection of a structural protein from the databank (2).At each timestep a region of interest (ROI) is selected: a datamap is created and a confocal image is generated with pySTED (3).The confocal image is used in the state of the agent (4) which then selects an action, i.e. the next imaging parameters (5).A STED image and a second confocal image are generated in pySTED (6).The imaging objectives and the reward are calculated (7).On the next timestep, the agent sees a new ROI, the previously simulated images and the history of the episode.b) The state of the agent includes a visual input (the images) and the history.The visual input of the agent is the current confocal (CONFt) and the previous confocal/STED images (CONFt 1 and STEDt 1).The state of the agent also incorporates the laser excitation power at which the confocal image was acquired (c), the history of selected actions (at) and the calculated imaging objectives (Ot).The history vector is zero-padded to a fixed length (h0i).The agent encodes the visual information using a convolutional neural network (CNN) and the history using a fully connected linear layer (LN).Both encoding are concatenated and fed to a LN model which predicts the next action.c) Evolution of the policy (left, STED: STED power, Exc.: Excitation power, Pdt.: Pixel dwell time) and imaging objectives (right, R: Resolution, P: Photobleaching, S: Signal ratio) for a fluorophore with high-signal and low-photobleaching properties during training at the beginning (cyan, 100k timesteps) and at the end (red, 12M timesteps) of the training process.A boxplot shows the distribution of the average value from the last 10 images of an episode (30 repetitions).d) Evolution of the reward during an episode at the beginning (cyan, 100k timesteps) and at the end of training (red, 12M timesteps) for the same fluorophore properties as in c).e) Evolution of the policy (left) and imaging objectives (right) after training (12M timesteps) during an episode for a fluorophore with the same photophysical properties as in c).f) Typical examples of images acquired during an episode.The image index is shown in the top right corner and the calculated imaging objectives in the top left corner.The STED image and second confocal (CONF2) image are normalized to their respective first confocal (CONF1) images.

Figure 6 :
Figure 6 : Caption is on the next page

Figure
Figure6: Bridging the reality gap between simulation and reality in RL by pretraining with pySTED.For all real microscopy experiments, the deployed agent was trained over 12M steps in simulation.The agent was deployed on a real STED microscope for the imaging of diverse proteins in dissociated neuronal cultures and cultivated Vero cells.a) Top: Simulated images of F-actin in fixed neurons were used during the training process.Deploying the RL agent to acquire an image of this in distribution structure in a real experiment allows the periodic lattice of F-actin tagged with Phalloidin-STAR635 to be revealed in all acquired images.Bottom: Structural parameters are extracted from the acquired images (the dashed vertical line represents the median of the distribution) and compared to the values that were previously reported in the literature (solid vertical line).The agent has learned to adjust the imaging parameters to resolve the 190 nm periodicity of the F-actin periodic lattice[53,54]).b) Top: The trained agent is tested on the protein TOM20, a structure that was never seen during training (out of distribution).The nano-organization of TOM20 is revealed in all acquired images.Bottom: The measured average cluster diameter of TOM20 concords with the averaged reported values from Wurm et al.[55].c) Top: Live-cell imaging of SiR-Actin shows the capacity of the model in adapting to different experimental conditions (out of distribution).Bottom: The periodicity of the F-actin periodic lattice is measured from each acquired images and compared with the literature.See Material and Methods for the quantification.The STED images are normalized to their respective confocal image (CONF1).The second confocal image (CONF2) uses the same colorscale as CONF1 to reveal photobleaching effects.d,e) Images acquired by the RL agent in a real experiment on a different microscope.Tubulin was stained with the STAR-RED fluorophore (d) and Actin was stained with STAR-GREEN (e)in fixed Vero cells.The sequence of acquired images goes from top left to bottom right.The confocal images before (CONF1) and after (CONF2) are presented for photobleaching comparison.The CONF1 image is normalized to the CONF2 image.The STED images are normalized to the 99 th percentile of the intensity of the CONF1 image.Images are 5.12 µm ⇥ 5.12 µm.The evolution of the parameter selection (left; STED: STED power, Exc.: Excitation power, Pdt.: Pixel dwelltime) and imaging objectives (right; R: Resolution, P: Photobleaching, S: Signal ratio) are presented, showing that optimal parameters and optimized objectives for Far-red (d) and Vis-STED (e) can differ greatly.

[ 9 ]
, the neural network implementation PrefNet was used to learn the preferences from an Expert.In the current work, two PrefNet models were trained from the preferences of an Expert.The same model architecture and training procedure were used as in Durand et al. [9].A first model is trained for the STED optimization to select from the resolution, photobleaching, and signal ratio imaging objectives.A second model is trained for the DyMIN optimization to select the trade-off between resolution, photobleaching, and artefact.The PrefNet model is used to repeatedly make the trade-offs in multiple optimizations in the simulation environment (Figure 3b-d and Figure 4c-g).
learns a policy function and a value function that measures the quality of a selected action.Both functions use the same model architecture.A convolutionnal neural network (CNN) extracts information from the visual inputs and a linear neural network (LN) extracts information from the history of the episode.The CNN encoder is similar to the one used in Mnih et al.[27].The encoder is composed of 3 layers of convolutions each followed by a leaky ReLU activation.The kernel size of each layer is 8, 4, 3 with a stride of 4, 2, 1.This allows the spatial size of the state space to be reduced.The LN model contains 2 linear layers projecting to sizes 16, 4. The information from both layers is concatenated and mapped to the action space using a LN layer.During training, the Adam optimizer is used with default parameters and a learning rate of 1 ⇥ 10 4 .The batch size of the model is set at 64.Each 512 steps in the environment, the model is trained for 10 batches which are randomly sampled from the previous 512 steps.A maximal gradient of 1.0 during backpropagation is used to stabilize training.
Photobleaching (%) Photobleaching (%) Photobleaching (%) after which the agent receives a reward r t 2 R and transitions to a state s t+1 2 S with a state transition function T (s t+1 |s t , a t ).Following the state transition, a reward signal r t = R(s t , a t , s t+1 ) The general problem in RL is formalized by a discrete-time stochastic control process, i.e. it satisfies a Markov Decision Process (MDP).An agent starts in a given state s t 2 S and gathers some partial observations o t 2 O.In an MDP, the state is fully observable, that is the agent has access to a complete observation of a state s t .At each time step t, the agent performs an action a t 2 A given some internal policy ⇡