In search of a conditioned place preference test to assess the severity of experimental procedures

To compare the severity of experimental procedures and behavioural tests from an animal’s perspective novel methods are needed. In theory, one feasible approach could be the use of a conditioned place preference test (CPP). In this test the preference for a certain area in a test apparatus is associated with an experimental treatment. Traditionally, the CPP is used to investigate, for example, the effects of drugs. Here, we wanted to develop a protocol, which instead would enable us to compare the effects of different experimental procedures conducted with mice. Nine experiments with C57BL/6J mice were performed, varying the setup, the procedure duration, the neutral to be conditioned stimuli (NS → CS; visual and/or textual), and the unconditioned stimuli (US; fixation, food reward, or weighing) as well as the presentation order (NS before, after, or parallel to US). Unfortunately, none of the tested protocol resulted in a distinct preference. Moreover, even simple protocols using food reward as a treatment failed to result in a conditioned place preference. Overall, none of the protocols was sufficient to form a reliable association between NS and US. We have scrutinized the experimental setup in detail, and we cannot present a solution yet. However, hopefully, our findings will help to create a working CPP to compare the severity of different experimental procedures for mice.


Introduction
Severity assessment of experimental procedures conducted with laboratory mice, such as the water maze or the Barnes maze, is a complex task and there are still reliable and comparable measures missing [1].To gain insight on the animals' perspective, the effect of the experimental procedure could be investigated by the means of preference tests, for example the conditioned place preference test (CPP).The CPP (or aversion) test is commonly used to assess the effect of drugs, such as ethanol (preference) or lithium chloride (aversion) [2][3][4][5][6].The CPP has already been used to test for welfare by comparing the effect of morphine and saline on tumor-free and tumor bearing mice [7].
The CPP is a form of classical (Pavlovian) conditioning, in which two neutral stimuli (NS, afterwards: conditioned stimuli, CS) each become associated with a different motivationally significant stimulus (unconditioned stimuli, US).As a consequence of 1/44 this learning procedure, the previously neutral CS are able to evoke a similar response to the one caused by the US.In the subsequent preference test, the animal is offered a choice between a spatial location near one of the CS.If the subject spends more time near one CS than near the other, this stimulus is interpreted as positively (or less negatively) associated than the second stimulus.Hence, with the help of a CPP, the valence of the unconditioned stimulus can be investigated.
In a next step, the CPP should be modifiable in such a way that an experimental procedure as a US is paired with a CS.In a similar manner this was already done comparing the opportunity to run on a running wheel with no running (rats: [12,13]; hamsters: [14]).By offering the choice between two CS which represent both two different experimental procedures, a comparison between the effects of the two experimental procedures could be made.Hence, with the help of the CPP, severity assessment or at least severity comparison of experimental procedures should be possible.In this manner, for example the Morris Water Maze and the Barnes Maze, which both focus on learning and memory, could be compared in regard of their severity from the mice's perspective.Here, we aimed to develop such a protocol for CPPs to compare the severity.
But how to best conduct this CPP?There are various ways to perform CPPs, even in drug testing.As an example for the great variety in CPP conductions, we have listed some studies and their differences in Table 1.In general, usually four or more conditioning sessions per treatment are conducted (i.e., pairing NS and US).Usually, this is done with one session per day (e.g., protocol by [2]).However, it is also possible, to perform two sessions per day, and thus, shortening the protocol (rats: [15,16], mice: [6,17,18]).
When planning a CPP, the choice of the CS is important, meaning whether to use visual [6], tactile [16], olfactory cues [14], or a combination of those [3,19,20].This choice should not be underestimated as some stimuli are easier associated with specific reinforcers (US) than other [21].
Differences in the conduction of CPPs can also be found in the conditioning setup, in which the CS are presented: It is possible to use one compartment with exchanged stimuli for the conditioning sessions and a two or three compartment setup for the preference test [22].However, it is more common, to use a two- [23] or three-compartment [16] setup for both testing and conditioning, with the access to the second compartment blocked during the conditioning sessions.The advantage of this latter setup is that the spatial position of the compartments can also be used as a CS if translucent walls are used [3,19].
An additional point which has to be considered for conditioning is the timing of stimulus presentation.When testing the effect of drugs, for which the CPP is most commonly used, US (drug) and CS (e.g., pattern) can occur at the same time.Thus, the scheduled time spent in the conditioning compartment in proximity to the CS depends on the time duration between drug injection and the onset of its effect (e.g., 5 min in the conditioning setup protocol for drugs in general: [2], 2 h for cocaine: [6], 30 min for access to corn oil or water: [9]).However, if experimental procedures are to be tested, we have to choose: present the US and CS simultaneously, present the US before the CS or present the CS before the US.This decision is important as timing might result different effects: Presenting the pattern before the access to the running wheel (rats: [12]) produced place aversion, while presenting the pattern simultaneously (hamsters: [14]) or afterwards (rats: [13,24,25]) produced place preference.This effect of presentation time is also known for drugs like amphetamine [26] and nicotine [27,28].Also, the duration of presentation of the US and the CS might also depend on the tested US.In some cases, the simultaneous presentation of US and CS can last 24 h, as was done to compare social and isolate housing [22].
There are also great differences in timing (duration and time point) of the habituation to the setup: Some studies conducted the habituation before the conditioning sessions, ranging from 5 min on 1 day [2] to 10 min on 3 days [11].Other studies performed it on 2 days at the last 2 days of conditioning, meaning before the final test, not before the conditioning sessions (for 20 min: [22,29]).Performing the habituation before the beginning of the first conditioning session has the possibility to use this habituation also as a baseline preference test, to compare the preference for the two CS of the same animals before the conditioning sessions.If the habituation session is conducted at the end of the conditioning phase, habituation can only be conducted with a setup emptied from all stimuli (to nit disturb the conditioning), and thus, baseline measurements are not possible.
Comparing different test durations (i.e., the final test), the range between studies is smaller: There are studies, which measured the stay preference within 10 min (rats: [30], mice: [11]), and other in 45 min (mice: [7]).Thus, the duration of the final test is often chosen independently from the duration of the conditioning sessions.
Interestingly, not all studies report the time of day when conditioning or tests were conducted (e.g., no detailed instructions given in [2], they just describe that it is performed during the light phase).However, this might have an influence, for example, on the motivation of the mice to gain food [31,32].
In this present study, we tried to develop a CPP test which would allow a comparison of two experimental procedures and their effects (US) by pairing the respective procedure with different CS.We focused on the CPP protocol by Cunningham and colleagues [2] but due to our experimental question, we had to make some changes.We report here all of our attempts to develop such a CPP protocol for the comparison of experimental procedures.For the design of our CPP, we had several boundary conditions: First, we wanted to conduct the experiments with mice, more precisely with C57BL/6J mice, because these are the most commonly used laboratory animals; therefore, finding a working protocol to conduct CPP tests with experimental procedures would make the greatest impact for severity assessment.Second, the CPP protocol itself should not be severe in any kind because this could cause misinterpretations of the results.Thus, food or water deprivation (as used by [12] and [13] in their running wheel experiments) were not feasible.Third, in the optimal case, the CPP procedure should be able to detect not only large differences in severity of the tested experimental procedures but also subtle ones.For this reason, the experimental procedures which were used here as US are not moderate or severe but only mildly severe (according to EU classification in annex VIII of the Directive 2010/63/EU).For example, the time in the restrainer was restricted to 1 min instead of the 15 min [33] or even 6 h [34] used in other studies.
In general, we want to emphasise that the experiments described here were all preliminary tests.However, we find it important to report them so that other researchers will know which approaches have already been tried.
Table 1.Examples of CPP experiments and their differing implementations.
see "Table 1.pdf" Articles are sorted by year and species.The article of [2] and [16] are instruction protocols.Note that habituation sessions are in some studies also used for baseline measurements.US = unconditioned stimulus, CS = conditioned stimulus, cond.= conditioning, / conducted in different ways, ?not described or unclear.

Overview
Altogether, we conducted twelve experiments, nine of them based on a conditioned place preference (CPP) design.Three additional experiments (not CPP), which we conducted to test a different approach for the comparison of the severity, are also described in the Supplements.In Table 2, the nine CPP experiments are summed up to give a better overview.In the following, we will go into details about the animal groups, the different CPP setups used and the conduction of the experiments.Details on the neutral, to be the conditioned stimuli (NS → CS, in the following called "conditioned stimulus" or CS) and the unconditioned stimuli (US) are provided in the Supplements.We always used a group of female C57BL/6J CrL mice which lived together in one cage system.No. = Number of experiment, Gr. = Group, Age = age in months of the mice, US = unconditioned stimulus, NS = neutral stimulus which becomes the CS = conditioned stimuli, sim = simultaneously, same = experiment was conducted in the husbandry room, other = experiment was conducted in a separate, experimental room than the usual husbandry room

Groups
Through out the course of the experiments, four groups of mice were used: All mice were female C57BL/6J CrL mice and purchased from Charles River, Sulzfeld.All mice within each group had different mothers and different foster-mothers to ensure maximal behavioural variability within the inbred strain.At the age of five weeks, transponders were implanted, a procedure performed under anaesthesia and analgesia (for details see section Transponder implantation).All groups were always handled by tunnel handling (for more details see https://wiki.norecopa.no/index.php/Mouse_handling,as well as [35,36]).
Group 1: A group of thirteen mice was purchased in December 2017 at the age of 3 weeks.This group was used for experiments 1-3 (12 months of age at the start of experiment 1).This group participated beforehand in other experiments, e.g., a T-maze preference test [37] as well as development and first tests of an RFID based tracking system for home cage based choice tests [38].

Group 2:
The second group consisting of twelve mice was purchased in June 2019 at the age of 4 weeks.This group was used for experiments 4-6 (2 months of age at the start of experiment 4).This group participated beforehand in other experiments, e.g., the validation of an RFID based tracking system for home cage based choice tests [38] and in-between the experiments described here, they were also part of a T-maze test [37].

Group 3:
The third group consisted of twelve mice and was purchased in September 2019 at the age of 4 weeks.This group was used for experiments 7-8 (13 months at the start of experiment 7).This group participated beforehand in other experiments, e.g., the development of a home cage based cognitive bias test [39].

Group 4:
The fourth group consisted of twelve female C57BL/6J CrL mice was purchased in September 2020 at the age of 4 weeks from Charles River, Sulzfeld.This group was used for experiment 9 (10 months at the start of it).This group participated also beforehand in other experiments, e.g., the development of a home cage based cognitive bias test [39].
Additional remarks: It was noted a few weeks before experiment 5 that eleven of the twelve mice of group 2 completely or partly lacked their whiskers, probably due to plucking / barbering behaviour.The same was true for all twelve mice of group 3 in experiments 7 and 8.It has to be noted, however, that whisker-loss as in group 2 and 3 can influence the tactile-guided behaviour, for example, novel object recognition or open field activity [40,41].In addition, barbering is a model for a disorder (trichotillomania).Therefore, it is also important that mice which barber were found to show no difference in learning ability itself, with the exception of a extra dimensional shift task [42].In our experiments, only simple learning was required.Thus, in the sense of the 3R, we decided against ordering a new group of mice for the experiments (5 and 6).Still, we changed the previous experimental design from a tactile cue as CS to a visual one, as the whisker-loss should not have influenced the mice's ability to perceive visual cues.In addition, we basically repeated experiment 8 with a fully whiskered group in experiment 9, to additionally compare the results of a whiskered and de-whiskered group.
It has also to be noted that we used the mice until a rather old age: In the sense of the 3R for these preliminary tests we wanted to "re-use" animals which had already participated in other experiments instead of ordering new ones.In general, repeatability of activity measures increases with the age of mice [43], and C57BL/6J mice performed well in visual detection, pattern discrimination and visual acuity tasks even until 24 months [44,45].Therefore, we argued that the use of older mice is feasible.In addition, the tests in which the animals participated beforehand were of a different nature, and thus, we see no reason to anticipate an after-effect of these.

Housing
Groups of mice were kept in two connected type IV macrolon cages (L x W x H: 598 x 380 x 200 mm, Tecniplast, Italy) with filter tops.As connection between the cages, a Perspex tunnel (40 mm in diameter) was used.This cage system was chosen due to 5/44 other research purposes, and mice had lived in it since they were around 2 months (group 1), 3 months old (group 2) or 1 month (group 3 and 4).In group 3 and 4, the tunnel had been replaced by an AnimalGate (TSE systems, Germany) during the months preceding the CPP.Food (autoclaved pellet diet, LAS QCDiet, Rod 16, LASvendi, Germany) and tap water (two bottles each cage) were available ad libitum.Both cages were equipped with bedding material (Poplar Granulate 2-3 mm, Altromin, Germany) of 3-4 cm height, a red house (The MouseHouse, Tecniplast, Italy), nesting material (papers, cotton rolls, strands of paper nesting material), and two wooden bars to chew on.Both cages also contained a Perspex tunnel (40 mm in diameter, 17 cm long), which was used for tunnel handling.Group 3 and 4 also experienced weekly changing cage equipment (houses, nesting material, active enrichment filled with millet; this was introduced after the findings of [46]).
(winter time) or 7:30 and 8:00 (summer time) a sunrise was simulated: A wake-up light (HF3510, Philips, Germany) gradually increased the light intensity until the overhead light went on.
Once per week, home cages were cleaned and all mice were scored and weighed.In this context, mice also received a colour code on the base of their tails, using edding 750 paint markers, to facilitate individual recognition.

Experimental Room
In experiments 1 to 5, we used a separate room for the experimental procedures and not the room in which the animals was kept in.Between experiments 4 and 5 (before 4.1, not a CPP experiment, see Supplements) the housing room was changed, and for logistical reasons, the experimental room had also to be changed.For both used experimental rooms, temperature was maintained at 22 ± 3 • C and humidity at 55 ± 15 %.The rooms had an automatic 12h/12h dark/light cycle with the light phase starting at 7:00 a.m.(winter time) or 8:00 a.m.(summer time), respectively.

Transponder implantation
At the age of five weeks, transponders (FDX-B transponder according to ISO 11784/85; group 1: Planet-ID, Germany; group 2 -4: Euro I.D., Germany) were implanted under the skin in the neck of the mice.To do so, in group 1 all mice obtained an analgesic (Meloxicam, 1mg/kg) two hours before the procedure.The transponder implantation itself was performed under isoflurane anesthesia (induction of anesthesia: 4 l/min 4 %; maintenance of anesthesia: 1 l/min 1-2 %).RFID (radio frequency identification) transponders were injected directly behind the ears subcutaneously in the neck, so that they were rostrocaudal oriented (for detailed description see [38]).After transponder implantation, mice were placed individually in a separate cage with bedding and sheets of paper, and monitored until they were fully awake again.Then they were returned to their home cage.In group 1, two mice lost their transponders after the first implantation, and for those two mice the transponder implantation was repeated at the age of 8 weeks.
For group 2 -4, the administration time of the analgesic was altered to the evening before the procedure.We hoped to reduce transponder loss this way: By administering the Meloxicam earlier, the analgesic effect was expected to cease before the dark phase (active phase) after the implantation, and mice would be more hesitant to focus on the injection side.Implantation of the transponders was performed in the same way as in 6/44 group 1.In group 2, no transponder was lost.In group 3 and 4, one transponder was lost and the procedure was repeated with the respective mouse the very next day (group 3) or five days later (group 4), respectively.

Setups
Setup 1: Two cages connected via a tunnel This setup was used in experiment 1 and 2. For the baseline and the preference test, two type II cages (LWH: 225 x 167 x 140 mm, Tecniplast, Italy) were connected with a tunnel (similar to the tunnel from the home cage system: 4 cm in diameter, 24 cm long).Both cages were closed by a lid and a filter top.Between the cages, an automated positioning system based on light barriers was installed (see Fig. 1A): The light barriers were 15 cm apart (more than one mouse length from nose to tail tip) and connected to an Arduino Leonardo micro controller with a real-time clock, an SD card and a time display.Whenever changing cages and passing through the tunnel, a mouse would interrupt the two light barriers.A self-written software ensured that each interruption of the light barriers was saved with a time stamp onto the SD card.If the left light barrier was interrupted last, the mouse was counted as belonging to the left cage, and if the right light barrier was interrupted last, the mouse belonged to the right cage, respectively.As the positioning system did not register the start time (i.e., when the lid was closed), we used video recordings to get this time point.To verify the automatic positioning method, for the habituation session the automated results were compared to video recordings.(The positioning system used here was a prototype of the system which later on became the "MoPSS": The Mouse Positioning Surveillance System is an open-source RFID based tracking system suitable for individual tracking of group house mice [38].) For conditioning sessions, an independent type II cage was used.Note that, therefore, this setup did not support the usage of external spatial cues because conditioning sessions were performed in a cage differing from the test cage (similar to a one-compartment design, see [19]).
In this setup, bedding material or different kinds of gravel were used as CS.Between mice, the CS were removed and cage, lid, and tunnel cleaned with 70 % ethanol.
For conditioning sessions, each mouse had its own conditioning cage to facilitate the procedure (no cleaning between mice).It was emptied, cleaned with 70 % ethanol and re-filled with the CS (bedding material or gravel) between the experimental days.In addition, between mice, the weighing and fixation surface were cleaned with 70 % ethanol.

Setup 2: One cage separated by a barrier
This setup was used in experiments 3 -6.The second setup was designed based on the model of [2]: Guiding bars were added to a type III cage (LWH: 425 x 276 x 153 mm, Tecniplast, Italy) to apply a small barrier (2 cm height, for habituation and preference test) or a plate (about 13 cm height, for conditioning sessions).Thus, the cage was divided into two areas of the same size, which contained different CS (see Fig. 1b     and 1c).The small barrier was used during the habituation and preference test, to have an actual visual and physical separation of the two areas.The plate was used during conditioning sessions.Note that as the outside walls were transparent, this setup allowed external spatial cues, such as proximity and colour of walls.This setup was not closed by a lid but by a Perspex plate, so that video recordings were possible from above.The camera was applied to a metal construction (in the first experiment out of beams by fischertechnik GmbH, Germany, later replaced by In this setup, either different metal plates (experiment 3 and 4) or visual patterns (experiment 5 and 6) were used as CS, in both cases positioned on the floor only.As we had only one setup, in conditioning sessions, habituation, and the final test, the setup was cleaned between mice: In experiment 3 and 4, the setup and its barrier/plate was cleaned with 70 % ethanol between mice.The floor plates were first cleaned with water, dried with paper, and then cleaned with 70 % ethanol and dried with paper again.In experiment 5, between mice the setup was first cleaned with water (to wash away the liquid and general soiling) and then with 70 % ethanol.In experiment 6, between mice the setup was only cleaned with water if urination or defecation occurred.This was done to reduce potential odour effects of the ethanol itself (see also Discussion).In all experiments, the setup was cleaned with 70 % ethanol between experimental days.

Setup 3: Two compartments separated by a wall
This setup was used in experiment 7 -9.Two compartments (LWH: 32 x 11 x 20 cm) consisting of grey plastic were joint together (see Fig. 2A, based on [16]).Three of the four walls were additionally covered with Perspex plastic, behind which a sheet of laminated paper with a pattern could be placed.The fourth wall was the connection to the other compartment and was removable.This wall contained either a hole (experiment 8: 6 cm width and 7 cm height, experiment 9: 6 cm diameter) to allow the mice to change compartments (for habituation, baseline and test), or not (conditioning sessions).In experiment 8, between both compartments (between the removable walls) there was a small area neither belonging to one of the compartments of approximately 2 cm length (width and height were the same as the compartments).In experiment 9, this area was filled with a fitting wall of 3 cm width (3D printed, black PLA).Accordingly, the separation wall for the conditioning sessions was printed out of the same material.
In experiment 8, the floor consisted of the same grey plastic as the walls.In experiment 9, we placed 3D printed black plates containing structures on the floor to add tactile cues.A metal construction (MakerBeam B.V., The Netherlands) was built to hold a camera, so that the apparatus could be filmed from above.
The setup was cleaned with 70 % ethanol before or after each experimental day.
Between mice, the setup was only cleaned if faeces or urine were present, and cleaning was done just with water.This was done to reduce potential odour effects of the ethanol itself (see also Discussion).
Note that this setup excluded external spatial cues due to its opaque walls.

General Procedure
For all of the following experiments, some parameters always remained the same: Randomization: The pairing of neutral, to be conditioned stimulus (NS → CS) and unconditioned stimulus (US) was randomized for all mice (e.g., for half of the mice CS A was paired with US 1 and for half of the mice CS A was paired with US 2).Also, the mice were randomly assigned to start with US A or US B. Within these subsets, the presentation side of the CS was randomized in such a way that about half of the mice experienced CS A left and CS B right and the other half the other way around.In addition, the order in which the mice were taken out of the home cage for the sessions was randomized, and this was also the case for the order of mice in the baseline and the preference test.Moreover, the start compartment into which the mice were placed at the beginning of the baseline and final preference test (left or right) was randomized.
Blinding: Baseline and final preference tests were video recorded.Analysis of the video recordings were evaluated without knowledge of the pairing of treatment (procedure, US) and CS or side for the individual animals.
Exclusion: Animals, which did not change compartments even once during baseline or final preference test, were excluded from analysis.This was only relevant for setup 1.
Here, in both experiments, one mouse (the same in both experiments) did not change cages even once during the 10 min.Therefore, we could not tell with certainty that the mouse was aware of both options, and excluded it from the analysis.In addition, in experiment 2, the baseline test was conducted with thirteen mice.However, one mouse had to be killed before the start of the conditioning sessions due to health issues independent from the experiment.Thus, the experiment itself was only performed with twelve mice.If the experiment lasted 6 days, the first day for baseline testing was followed by 4 days of conditioning (two conditioning sessions per day, with a break of a few days between the baseline test and the first conditioning session to evaluate the baseline for a bias in preference), and 1 day for the final preference test.In contrast to other studies using the 6 day schedule (rats: [15], mice: [6,18]), we decided against a morning and an afternoon conditioning session because this would have resulted in waking the mice in the middle of their inactive phase, and we did not want to disrupt their circadian rhythm more than necessary.Instead, we performed the conditioning sessions directly after each other.In addition, to prevent a time effect (early morning versus late morning), the procedure tested last on one day was the first to be tested the next day (e.g.conditioning day 1: US 1, US 2, conditioning day 2: US 2, US 1, conditioning day 3: US 1, US 2, and so on).

Conditioned Stimuli
Flooring Material (Tactile, Visual and Odour Cues) Experiment 1: As CS, bedding material was used, similar to the studies by [8,29] and [47].We used "pure" and "comfort white" bedding material (JRS, J. Rettenmaier & Söhne GmbH + Co KG, Germany).Although they both consist of cellulose, they are distinguishable in size, texture and, even for a human, in odour (for a picture see Supplements).
Experiment 2: As a modification of experiment 1, different types of gravel were used (obtained in a local DIY store).We intended to use pumice and marble but as mice showed a clear preference for pumice during a baseline test (probably due to its differing thermal characteristics), we used marble and quartz instead.The gravel was thoroughly washed and disinfected by heating before using it for the experiment.Marble and quartz differed in colour and shape (for a picture see Supplements).Thus, they included visual and tactile cues.

Metal Plates (Tactile and Visual Cues)
Experiments 3 and 4: In both experiments, plates were used, referring to the studies of [2,3] (only tactile stimuli, no additionally coloured walls).Here, we used a metal plate with holes and a metal plate with slits (for a picture see Supplements).
Plates were obtained in a local DIY store and cut to 180 x 210 mm to fit into one half of a type III cage.Both types of plates consisted of aluminium.Thus, flooring materials included a visual (dots vs. stripes) and a tactile cue (holes vs. slits).In addition, plates were placed in setup 2, which provided a spatial cue (left vs. right side of the conditioning cage).
In addition, all plates might have thermal cues.In experiment 3, it was noted while disinfecting the plates with ethanol in-between the mice that the metal went very cold from evaporation.Thus, depending on the time between cleaning and getting the next mouse into the conditioning cage, the "coldness" of the plate might have differed for every session and every mouse.Also, this "coldness" might have been different for the two plates due to their recess structures (many small holes or large long slits).This might have the effect of an additional cue.
To specify this effect, we took measurements after the end of experiment 3 using a infra red camera.We did not observe a temperature difference between the plates themselves.However, there was a temperature difference between the measurement directly after the cleaning vs.

Patterns (Visual Cues)
Experiments 5: A laminated paper with a patterns was placed directly under each cage half (setup 2).The patterns were designed after the description of [2]: One pattern consisted of black circles (diameter: 6.4 mm), each row shifted slightly in its center to the one before.The other pattern consists of wide black lines (widths: 3.2 mm, space between edges: 6.4 mm) (stripes).
Experiments 6: Self-designed patterns were used, consisting of black and white squares.Care was taken that the patterns contained the same amount of black and white space, but differed in block size and alignment of blocks (shifted vs. linear).This resulted in one pattern similar to a chessboard and one similar to fabric texture (for a picture see Supplements).
Experiments 7, 8 and 9: Patterns were either horizontal or vertical black and white stripes (similar to [44]).

Plastic Plates (Tactile and Visual Cues)
Experiments 9: Tactile cues in addition to visual cues (referring to studies of [3,19]) were used.This was done by adding 3D printed plastic plates (material: PLA) onto the floor of the procedure environment and the conditioning setup (for a picture see Supplements).
Plates for the procedure environment (conditioning session) were white, 2 mm high and contained protruding tactile cues: either bars or dots.Plates for the conditioning setup were black, 2 mm high and contained holed tactile cues: either squares (diagonally organised) or rounded crosses (in parallel), similar to the patterns described by [48].Plates for the habituation beforehand (procedure environment and test setup) were smooth without any additional tactile cues.

Unconditioned Stimuli
Fixation (Restraint by Hand) Experiments 1 and 2: Before each mouse, the cage lid and the cage, on which the lid was placed, were cleaned with 70 % ethanol in order to eliminate any olfactory cues by previously tested mice.The to be tested mouse was then taken out of the conditioning compartment by tunnel handling and placed on the surface.If the mouse did not leave the tunnel voluntarily, it was tilted until the mouse gently slid out.While holding the tail with one hand, the animals got restraint by taking the loose skin of the scruff between thumb and index finger of the other hand and lifting the mouse off the lid (as described, e.g., in [35]).The mouse was held for 20 s, before it got released straight into the conditioning compartment (experiment 1) or back onto the surface (experiment 2).The mouse could then go back into the tunnel and was returned into the transportation cage.
Experiment 3: Before each mouse, the surface on which the mice were placed (on top of an upside down cage, type 1144B, LWH: 331 x 159 x 132 mm, Tecniplast, Italy) was cleaned with 70 % ethanol in order to eliminate any olfactory cues by previously tested mice.The to be tested mouse was then taken out of the conditioning compartment by tunnel handling and placed on the surface.The back opening of the tunnel was sealed by hand and it was waited until the mice left the tunnel by itself.
While holding the tail with one hand, the animals got restraint by taking the loose skin of the scruff between thumb and index finger of the other hand and lifting the mouse off the surface (as described, e.g., in [35]).The mouse was held for 20 s, before it got released back onto the surface.The mouse could go back into the tunnel and was returned into the transportation cage.The to be tested mouse was taken out of the conditioning compartment by tunnel handling and placed into the weighing vessel on top of a scale.In case of the glass jar (experiment 1, 2, 4), a lid was placed on top to prevent the mouse from climbing out.

Weighing
After weight had been noted, the mouse was taken out of the weighing vessel by tunnel handling and placed into to conditioning compartment (experiment 1) or to the transportation cage (experiment 2 and 4).

Millet in Separate Cage
Experiment 4: The mouse was taken out of the conditioning compartment by tunnel handling, and placed in a type III cage (LWH: 425 x 276 x 153 mm, Tecniplast, Italy) filled with bedding material (Poplar Granulate 2-3 mm, Altromin, Germany, same as in the home cage).The cage contained at one end a 0.1 g millet as food reward (experiment 4).If the mouse started feeding on the millet, the mouse it was taken out of the cage immediately after it stopped feeding (independent from the amount eaten).Feeding was defined as sitting beside the millet for some time and eating the grains audibly (at least 3 grains).If the mouse did not start feeding for 1 min, the mouse was taken out of the cage by tunnel handling and returned to the transport cage.Between mice, the cage was cleaned with 70 % ethanol and bedding material was changed.Mice 13/44 were habituated to the millet before the experiment by offering it for three days in the morning, three times 2 g in different places in the cage.
Experiment 6 and 7: see section Millet and Bedding.
Experiment 8: The mouse was taken out of the home cage by tunnel handling, and placed in a type III cage (LWH: 425 x 276 x 153 mm, Tecniplast, Italy) filled with bedding material (Poplar Granulate 2-3 mm, Altromin, Germany, same as in the home cage).Directly after the insertion of the mouse, 0.1 g millet were placed at one end of the cage.When the mouse stopped feeding on the millet and no remaining grains were visible or at the latest after 1 min, the mouse was taken out of the cage by tunnel handling and placed into the conditioning compartment.Between mice, the cage with the bedding was changed and at the end of each conditioning session, bedding was removed, the cage cleaned with 70 % ethanol and filled with new bedding.Mice were familiar with millet from previous experiments.

Experiment 9:
The to be tested mouse was taken out of the home cage by tunnel handling, and (similar to restrainer, experiment 9) placed into a type III cage (LWH: 425 x 276 x 153 mm, Tecniplast, Italy) which had a 3D printed white plate with tactile stimuli on its floor.The mouse had 5 s to inspect the cage (and its stimuli), before 0.01 g (16 grains) of millet were placed into the cage, always at the same spot.After 1 min, the mouse was taken out by tunnel handling and placed into the conditioning compartment (setup 3).Between mice, the floor plate was cleaned with water, when the mice urinated or defecated.Between experimental days, the cage and its floor plate were cleaned with 70 % ethanol.Mice were familiar with millet from active enrichment in their home cage.

Restrainer Experiment 8:
The to be tested mouse was taken out of the home cage by tunnel handling, and (similar to millet, experiment 8) placed into a type III cage (LWH: 425 x 276 x 153 mm, Tecniplast, Italy) filled with bedding material (Poplar Granulate 2-3 mm, Altromin, Germany, same as in the home cage).It was then guided or pushed by hand into the tunnel of the custom-built restrainer (detailed description and picture see Supplements).The first (outer) restrainer barrier was inserted and the timer was started.If possible by the size of the mouse, the second (inner) barrier was then inserted, to further restrict the space.After 1 min, the mouse was directly released into the conditioning compartment (setup 3).Between mice, the restrainer, barriers and seal were cleaned with water, when the mice urinated or defecated.Between experimental days, the restrainer was cleaned with special tissues free from alcohol and aldehydes (Microbac Tissues, Hartmann, Germany) and water, while the cage was cleaned with 70 % ethanol.
Experiment 9: The same custom-built restrainer as in experiment 8 was used.The to be tested mouse was taken out of the home cage by tunnel handling, and (similar to millet, experiment 9) placed into a type III cage (LWH: 425 x 276 x 153 mm, Tecniplast, Italy) which had a 3D printed white plate with tactile stimuli on its floor.The mouse had 5 s to inspect the cage (and its stimuli) and was then guided or pushed by hand into the tunnel of the restrainer.Immediately, the first (outer) restrainer barrier was inserted and the timer was started.Then the second (inner) barrier was then inserted, to further restrict the space.After 1 min, the mouse was directly released into the conditioning compartment (setup 3).Between mice, the restrainer, barriers, seal and 14/44 floor plate were cleaned with water, when the mice urinated or defecated.Between experimental days, the restrainer was cleaned with special tissues free from alcohol and aldehydes (Microbac Tissues, Hartmann, Germany) and water, while the cage and its floor plate were cleaned with 70 % ethanol.

Fluids
Experiment 5: As US diluted almond milk and tab water were used: The almond milk was prepared using 1 part "Mandel Drink" (containing water, 7 % almonds, and salt, Alnatura GmbH, Germany) and 3 parts tab water from the husbandry room.The diluted almond milk was prepared in the morning right before the first conditioning session and kept in a fridge from thereon for the next days, so only the amount which would be used this day was taken into the experimental room.Mice were unfamiliar with almond milk.
This procedure was performed in the conditioning compartment (setup 2).The compartment was covered by a perspex plate, and access to the respective fluid was given through a bottle inserted into a hole in the covering plate.Between mice, the experimental cage was cleaned with 70 % ethanol.
During the first day of conditioning (session 1 and 2), bottles were filled with 200 ml fluids and it was noted that bottles trickled a lot, leading not only to drops on the floor but to a puddle with approximately 5 cm diameter.Because mice hesitated to step into the puddle, the bottle directly above became too far away for some mice to reach it.To improve this situation, for the second day of conditioning, the plate for bottle insertion was altered, so that it hang lower (about 2 cm lower than before).In addition, bottles were filled with 600 ml of fluid, which reduced the trickling (although it did not stop, see picture in Supplements).

Millet and Bedding
Experiments 4 and 8: see section Millet in Separate Cage Experiment 6: For this procedure, millet (Goldhirse, Spielberger Mühle, Germany) and bedding material (Poplar Granulate 2-3 mm, Altromin, Germany, the same used in the home cage) were used.In one treatment, the bedding material was presented alone, in the other, it was mixed with millet.Mice were familiar with millet from previous experiments and also habituated to feeding on millet outside their home cage environment.
The procedure was performed in the conditioning compartment itself (setup 2).In the middle of the compartment, a small amount of bedding material (as much as fitted into a 0.5 ml Eppendorf tube) was placed, which was either mixed with 0.1 g millet or not.After placing the mice inside the compartment, the compartment was covered by a Perspex plate.Between mice, the experimental cage was cleaned with water if mice urinated or defecated.
Experiment 7: For this procedure, millet (Goldhirse, Spielberger Mühle, Germany) and bedding material (Poplar Granulate 2-3 mm, Altromin, Germany, the same used in the home cage) were used (unmixed).Mice were unfamiliar with millet and not habituated to feeding on millet outside their home cage environment but were thoroughly habituated to the conditioning compartment (without the CS) so we expected mice to immediately feed on the millet.
The procedure was performed in the conditioning compartment itself (setup 3).
After the mouse was placed inside the conditioning compartment, it was waited 30 s before the treatment was applied.This was either 0.1 g millet or a visually similar 15/44 amount of bedding material.The compartment was not covered with a plate.Between mice, the experimental cage was cleaned with water if mice urinated or defecated.

Transportation
For those experiments, which were conducted in a experimental room instead of the housing room, transportation was necessary.In experiment 1 -5, transportation cages were used for this procedure because transportation of the home cage setup was not possible.Mice were taken out of it by tunnel handling and placed into a transportation cage (type IV macrolon cage, LWH: 598 x 380 x 200 mm, Tecniplast, Italy).The transportation cage was equipped with the usual bedding material, a red house (The MouseHouse, Tecniplast, Italy) and two sheets of paper as nesting material/enrichment.Food (pellet diet, LAS QCDiet, Rod 16, autoclavable, LASvendi, Germany) and water (tab water, two bottles) were provided ad libitum.

Video Analysis
During the baseline and the preference test, the animals were placed for 10 min into the setup.From this total duration (starting at the moment, the filter top or plate was placed on top of the setup or, in experiment 6 -9, at the moment the mouse left the handling tunnel with all four feed), the first minute was taken as habituation time and not analysed any further.The remaining nine minutes were then analysed.
Setup 1 (experiment 1 and 2): Time spent in each component of the setup was recorded by an automatic positioning system based on light barriers (see section Setup 1: Two cages connected via a tunnel), so no specific video analysis was performed.[49]).A change of compartment was counted whenever a mouse moved with all four paws over the barrier in the other compartment.Time in-between, when the mouse was partly in one and partly in the other compartment (setup 2), was counted as still belonging to the side the mouse was last.
Setup 3 (experiment 7 -9): Time spent in each component of the setup was recorded with the help of BORIS.Setup 3 had an area between the compartments, which was small enough for the mouse to be still partly in one of the compartments.
Here, we counted a mouse as being in one compartment, when all four paws were inside it, and we counted a mouse as leaving one compartment, as soon as it had its head in the area between the compartments.Duration of stay was normalised for time spent in one of the compartments (excluding the time in-between compartments).

Statistical Analysis
Data was analysed with regard to side preference (left vs. right compartment), CS presented in the compartments (e.g., horizontal vs. vertical stripes) and US paired with the presented CS (e.g., millet vs. bedding).This was also done for the baseline test.In this case, the test was done for a compartment with which a specific procedure was going to be paired ("procedure compartment"), although the mice hadn't experienced the combination of compartment and procedure yet.In general, the amount of time in each compartment was calculated as percentage, either of the complete time after 1 min habituation (9 min, setup 1 and 2) or of the total time in one of the compartments (9 min minus the time spent between compartments, setup 3).
All statistics were calculated using R. To test for normal distribution, the Shapiro-Wilk test was performed.In all experiments, the data did not differ significantly from a normal distribution (p > 0.05); therefore, a t-test was used to compare the preferences with a random chance level of 0.5.For comparison between pre-conditioning and post-conditioning results, a paired t-test was used.In all statistical tests, significance level was set to 0.05 and it was tested two-sided.As the experiments were considered as preliminary studies, the results were not Bonferroni corrected.
Results are always reported as the mean ± standard deviation.

Experiments and Results
In this section, we will give a short description of the procedure used in each experiment and directly provide the results.We decided upon this structure because the alterations in the procedures were directly related to the results from the previous experiment, and we believe in this manner, our considerations will become more clear.
Note that in this main article, we will present the results of the video analyses (the baseline and the final test).An overview of the results, i.e., mean, standard deviation, p-values for the tested factors US (procedure), the CS (cue) and the side, is given in Table 3.In Fig. 3 and Fig. 4, the results are depicted graphically as box-plot diagrams.Additional observations, e.g., on the behaviour of the mice during the procedure of the experiments, are described in the Supplements.
Table 3. Overview of the statistic results of the CPP experiments.
see "Table 3.pdf" All results were normally distributed, so either a t-test (baseline vs. chance and test vs. chance) or a paired t-test (baseline vs. test) were conducted.As the experiments were considered as preliminary studies, the results were not Bonferroni corrected.The number of animals is the number of animals after exclusion (in experiment 1 and 2, one mouse had to be excluded).All percentages are given as mean and standard deviation.exp.= experiment, n = number of animals, baseline = baseline vs. chance, test = test vs. chance, pre vs. after = pre-conditioning (baseline) vs. after conditioning (test), US = unconditioned stimulus, CS = conditioned stimulus, l = left, r = right, other abbreviations in the column "average" correspond to the stimuli in column "what".

Procedure
In the first CPP, two experimental procedures, weighing and fixation (restraint by hand), were compared, using setup 1, which consists of two cages connected via a tunnel with an automatic positioning system based on light barriers.The CS, two different bedding materials, were presented before and after the US.
Weighing was conducted in the same manner as during the weekly cage cleaning and was expected to have a neutral effect.Fixation was conducted on a lid as for health monitoring or Meloxicam application before transponder implantation and was expected to have an aversive effect.The experiment had a 10 day schedule (see section General Procedure).
Baseline test, conditioning sessions and final preference test were not conducted in the husbandry room but in a separate, experimental room.Mice (n = 13) were placed in a transportation cage and moved to the experimental room.There, the mice were given 30 min to habituate to the environment.After the experimental procedures, mice were transported back to the husbandry room and returned to their home cage system.
During baseline and final preference test, mice were placed in one of the connected cages.Mice were free to move between cages, and their position was recorded for 10 min by an automated positioning system based on light barriers installed around the connecting tunnel (for more details see section Setup 1: Two cages connected via a tunnel).
For the conditioning sessions, the mouse was taken out of the transportation cage and placed in a single cage (not belonging to the setup) with bedding material.Each mouse stayed in this conditioning cage (with the respective bedding material) for 5 min, before the respective procedure was performed (fixation or weighing).If the mouse was hesitant to leave the tunnel (into the glass jar for weighing or onto the lid for fixation), the tunnel was turned or tilted until the mouse slipped softly out of it.After the experimental procedure, the mouse was then again placed for 5 min into the conditioning compartment.Afterwards it was returned to the transportation cage.

Results
Already during the baseline test, mice had a strong preference for one of the cues: Mice spend significantly more time in the cage with the comfort white bedding than the cage with the pure bedding (59.95 ± 7.54 %; p < 0.001, t = -4.57).However, in this first experiment the analysis was not conducted before the start of the conditioning sessions, so we continued with the experiment unaware of the bias.After the conditioning sessions in the final preference tests, preference for comfort white was unchanged (see Fig. 3, 64.88 ± 9,52 %; p < 0.001, t = -5.4159).No preference for US or a side bias were observed.Thus, pairing the bedding material with a supposedly negative (fixation) or a neutral (weighing) procedure had no impact on the preference for the bedding material. .Depicted is the time spent (in percent) on the bedding paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: fixation or weighing, CS: comfort white or pure bedding material.

Procedure
Because the results from experiment 1 indicated that the used CS (bedding material) had a too strong influence itself, as a second experiment, we basically repeated experiment 1 with the same group of mice (group 1), now using different flooring material: two different types of gravel.The same two experimental procedures, weighing and fixation (restraint by hand), were compared, and we used the same setup (setup 1, two cages connected via a tunnel with an automatic positioning system based on light barriers).Because some studies report opposing conditioning effects if the CS are presented before or after the US (see section Introduction and section Timing of US and CS), we now only presented it before the US (in contrast to experiment 1).Also, we changed the time schedule from 10 days to 6 days.Moreover, conditioning sessions were shortened from 5 to 3 min to prevent prolongation of the procedure into the afternoon.Everything else resembled experiment 1, i.e., for the conduction of the experiment, mice were transported to a separate, experimental room (the same room as in experiment 1), where they had 30 min of habituation to the new environment.Also the conduction of the conditioning, the habituation and the final test were similar to experiment 1.

Results
This time, to be sure that there was no initial cue preference, baseline preference was tested and analysed first before proceeding to the conditioning.A first comparison between pumice and marble gravel revealed a preference for pumice (64.85 ± 10.78 %, p < 0.001, t = 4.9692), probably because marble gravel is colder to the touch than pumice gravel with its air enclosures.In a second baseline test, marble gravel was maintained but this time compared to quartz gravel.Here, mice showed no preference for either of the materials (quartz: 53.08 ± 10.95 %; p = 0.3731, t = 0.93244).Thus, we used these two materials for the conditioning.
It has to be noted that just by random matching of mice and US -CS pairing (which US was paired with which CS), there was a statistic preference for one of the procedure compartments before the actual conditioning procedure (weighing compartment: 56.98 ± 8.77 %; p < 0.05, t = -2.6394).However, this preference was not apparent in the final test after the conditioning (weighing compartment: 53.04 ± 19.48 %; p = 0.6164, t = -0.51702).No preference for cue or side was found.As can be seen in Fig. 4, the standard deviation in the final test nearly doubled compared to the baseline test.

Procedure
In the third CPP, to further reduce the influence which the CS (gravel) seemed to have in experiment 2, we changed to plates as a CS.In addition, we changed the setup to a more common design: setup 2 consisting of a cage divided by a small or large barrier into two compartments with differing CS (based on [2]).Because of the setup change, the former automatic analysis was now done manually, by analysing video recordings.
The CS was again presented before the US.We used a 6 day schedule, and the same group of mice (group 1, n = 12).For the conduction of the experiment mice were transported to a separate, experimental room (the same as in experiment 1 and 2).We used the same experimental procedures as before, weighing and fixation (restraint by hand).However, as we considered that the surfaces on which the procedure was performed might be a CS themselves (especially as during fixation mice had a prominent contact with the surface), we harmonized them: Instead of using different containers for the two procedures (experiment 1 and 2 weighing: glass, fixation: lid), we used now a small cage.For weighing it was used the right way round, and for fixation it was turned upside down.Between mice, this cage was cleaned with 70 % ethanol.
In addition, when placing the mouse on the surface by tunnel handling, the back opening of the tunnel was sealed by hand and it was waited until the mouse left it by itself (in rare cases, this lasted up to a minute).On the last conditioning day, the latency to leave the tunnel for fixation or weighing was recorded.
It was noted during the conduction of the experiment that both plates in the conditioning setup got noticeably cold after the cleaning with ethanol between mice (for more details see section Conditioned Stimuli and Supplements).Still, we continued with the planned procedure.

Results
The baseline test revealed no preference for one of the factors cue, side or procedure compartment.Visually comparing the results of the baseline with baseline results of the previous CPP experiments (using setup 1), it is noticeable that the standard deviation is smaller than before.However, the standard deviation became larger again during the final test (see Fig. 5).
However, in the final test there was no preference of the mice for either of the procedure compartments, cue or side (see Table 3).Thus, although we took special care to harmonize the experimental procedures with regard to the surface to exclude it as an involuntary CS, no preference for a procedure compartment was measurable.It is possible that the temperature of the plates after cleaning with ethanol might have affected the conditioning procedure. .Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: fixation or weighing, CS: holes ot slits metal flooring plates.

Experiment 4
Procedure For the fourth CPP, we used group 2 (n = 12), which was naive to the conditioning procedure.We kept the setup from before (setup 2; one cage with two compartments with differing CS, separated by a small or large barrier depending on the experiment phase, i.e., habituation, conditioning or test) and the same CS, i.e., metal plates with holes or slits.However, we changed the US (procedures) because we considered that the nature of the fixation procedure (pressing the mouse onto the surface) might suppress learning any other CS than the surface itself.Thus, we now compared weighing (as performed during the weekly cage cleaning, supposedly neutral) and food reward using millet (supposedly positive).We used a 6 day schedule, and for the conduction of the experiment, mice were transported to a separate, experimental room (as in experiments 1 to 3).
In the previous experiment, mice defecated and urinated a lot in the setup, which could be a sign of fear [35,50].To reduce this, group 2 was habituated to the setup on three days for 1 min in their husbandry room before the baseline test.Note that many studies (including the protocol by [2]) do not perform additional habituation sessions beyond the baseline measurement, which was the reason why we did not perform habituation sessions in the preceding experiments.
To ensure that the mice would consume the millet during the conditioning sessions, mice were habituated to it: On 3 days before the start of the experiment in total 6 g of millet were placed into the home cage in the morning after the onset of the light phase.
The glass jar for weighing was the same as used in experiment 1 and 2 but unfamiliar to the new group until the start of the experiment (i.e., a different container was used for the weekly weighing).The weighing procedure itself was performed as during the weekly cleaning procedure (see section Weighing).For the food reward procedure, a mouse was placed into a type III cage filled with bedding material (similar to the home cage), which had 0.1 g millet at one end of the cage.During both procedures, the time to leave the tunnel onto the surface and the time, when the mouse re-entered the tunnel after the procedure was noted for all conditioning sessions.In addition, during the millet procedure, it was also noted, when the mouse began feeding and when it stopped to do so.This was done to see if a change in behaviour occurred over time.
As we also noted in experiment 3 that the metal plates used as CS got noticeably cold directly after cleaning with 70 % ethanol, in this experiment we waited at least 90 s after the cleaning before a mouse was placed onto the plates.The rest of the cleaning procedures remained the same.

Results
There was no preference for procedure compartment, cue or side during the baseline test.In the final preference test, there was a tendency towards a preference for the slit plate (cue), although it did not reach significance (see Fig. 6, 54.07 ± 7.02 %; p = 0.0696, t = 2.01).In addition, comparing the results before and after conditioning, there was a tendency for an increased side preference (p = 0.08055, t = -1.9244).
However, there was no preference for a procedure compartment (millet compartment: 51.27 ± 8.10 %; p = 0.5993, t = 0.54097).This time, standard deviations of baseline and final test were similar. .Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or weighing, CS: holes vs. slits metal flooring plates.

Experiment 5
Procedure Between CPP experiment 4 and 5, we conducted two other experiments, using different approaches than CPP to compare the effect of procedures.A description of them can be found in the Supplements (experiments 4.1 and 4.2).
Looking at the preceding CPP experiments and their inconclusive results, it seemed clear to us that a fundamental element in the experiments was not working.Therefore, we decided to reproduce a "basic" protocol of the CPP as closely as possible.Only after a successful reproduction of the protocol we wanted then to move on and alter it to compare experimental procedures.
To do so, we used fluids as US instead of experimental procedure, similar to [9,10,51].Here, we compared tab water and almond milk, as we already knew from other studies in our research group [52] that almond milk is a preferred good.We used setup 2 (one cage with two compartments with differing CS, separated by a small or large barrier, similar to [2]) and group 2 again (n = 12).Thus, the mice were already habituated to it (in general, not to the CS).In addition, as CS visual patterns were used (designed as described by [2]).The CS were presented simultaneously with the US.We again conducted the 6 day schedule.Baseline test, conditioning sessions and final preference test were conducted in a separate, experimental room (for logistic reasons not the same as in the experiments before but it had similar conditions).
For the conditioning sessions, each mouse was taken out of the transportation cage and placed in one compartment of the setup which had a visual pattern (CS) underneath the compartment floor.A Perspex plate with a hole was placed on top of the compartment and the nipple of the fluid bottle was inserted through the hole.Mice were filmed to monitor whether the mice drank the fluids.After 3 min, the mouse was returned to the transportation cage.During baseline and final preference test, mice were placed into one half of the conditioning cage and activity was recorded for 10 min by a webcam.

Results
During the baseline test, there was no preference found for side, pattern or the procedure compartment.During the sessions, it was noted that mice seldom tested the fluids and never actually drank them.In addition, due to dribbling of the bottles the floor was expectantly wet.
In the final preference test, there was also no side, cue, or fluid preference (see Table 3).Comparing the results from the baseline test (pre-conditioning) with the final preference test (post-conditioning), there was no change in duration of stay on the pattern paired with almond milk (procedure compartment) or side.Interestingly, however, there was a significant increase in duration of stay on the dot pattern (cue; see Fig. 7, p < 0.01, t = -3.2257).
It has to be noted, that not all of the mice tested the fluid during conditioning sessions and only one mouse was observed actually drinking.Thus, the actual pairing of US and CS might not have taken place (no experiencing the US, for more details on behavioral observations during the experiment see Supplements).

Experiment 6
Procedure As the consummation of the fluids during the preceding experiment was low, before CPP experiment 6, a series of pre-tests was performed with the same group of mice (group 2, n = 12), to determine under which conditions almond milk or millet were consumed by the mice in a new environment.We got to the conclusion that habituation (or the lack thereof) might play an important role.
Therefore, we repeated an alternate version of experiment 5: The whole experiment was now conducted in the housing room to avoid a new environment.Thus, instead of placing the mice first into a transportation cage, mice were directly taken out of their .Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: almond milk or water, CS: dots or stripes visual pattern.home cage to be placed in the conditioning setup.To facilitate the handling of the mice (placing it in or taking it out of the home cage), the filter tops of the home cage system were removed at the start of each experimental day.The mice were given 10 min to habituate to the changed light condition before the start of the experiment.The filter tops were put back on top of the cages after the last mouse had finished its session.
During the pre-tests with group 2, we observed that even after habituation to the cage, millet was more readily consumed than almond milk.As our main goal for this experiment was to find a working protocol which then could be altered later to specific procedures which should be compared, we changed from using fluids to millet.Thus, as US either millet mixed with bedding material (= millet) or only bedding material (= no millet) was provided.
Mice were already familiar with millet and feeding in a type III cage due to the pre-tests.We again used setup 2 (one cage with two compartments with differing CS, separated by a small or large barrier), which is very similar to the type III cage of the pre-tests.Two self-designed patterns consisting of black and white blocks leading to a fabric texture-like and a chessboard pattern (with the same amount of black and white in total) were used as CS.
We returned to a 10 day schedule as conducted in experiment 1 (similar to the protocol of [2]), with 1 day for the baseline test, 8 conditioning days with one conditioning session per day, and 1 day for the final preference test.On the conditioning days, US 1 and 2 were alternately presented.I.e., on conditioning days 1, 3, 5 and 7 half of the mice group experienced millet and on days 2, 4, 6, 8 no millet.For the other half of the mice group it was the other way around.
During conditioning sessions, mice were placed for 3 min individually into one half of the conditioning setup (with the respective visual pattern underneath).A Perspex plate with a hole was placed on top of the compartment.On the floor of the cage, a small amount of bedding material (as much as fits into a 0.5 ml Eppendorf tube) mixed either with 0.1 g millet or no millet was available for the mice.To monitor whether the mice consumed the millet, sessions were video recorded.In the baseline and final preference test, mice were placed into one half of the setup with only a small barrier between compartments and activity was recorded for 10 min by a webcam.

Results
During the baseline test, there was no preference for side, pattern or procedure compartment (see Table 3).During conditioning session, mice readily ate the millet (when provided).However, there was also no preference for procedure compartment, cue or side in the final preference test (see Fig. 8).There was no change in duration of stay comparing pre-and post-conditioning.

Experiment 7
Procedure Possible confounding factors of experiment 6 might have been that the patterns were not easily distinguishable for the mice.In addition, it is possible that the mice have to experience the onset of the experimental procedure in the respective compartment for a successful conditioning.This is based on thoughts from a study by [53] and similar to foraging strategies as discussed already by [54]: Why should the mouse "wait" in the final test near the millet-CS when it did never experienced the "filling" of the millet?In this case, the mouse might expect that the consumed millet will not be refilled, and therefore, the former millet-environment might be considered as empty as the no-millet-environment.
As a consequence, the procedure for experiment 7 was similar to the procedure in experiment 6, with the following alterations: Firstly, to reduce potential visual influence from the outside environment, we used a new setup (setup 3) instead of setup 2, which had opaque instead of transparent walls (based on [16]).The compartments were now separated by a wall either with or without a hole, which reduced the view into the other compartment.
Secondly, we used a new group of mice, which was naive to the CPP test (group 3, n = 12).Because this group was also naive to the CPP setup, we had four 3 min sessions of habituation (without the CS).This was done to ensure that mice would feed on the millet during the actual experiment.
Thirdly, visual pattern of horizontal or vertical black and white stripes were used as CS (similar to [13]).These patterns were already used by [44] and validated as distinguishable for C57BL/6 mice in the respective age.The patterns were applied to three of four walls of each compartment (compare setup 2: CS on the floor).
Fourthly, mice were first placed inside the compartment without any US being present.Only after 30 s the US was added: either 0.1 g millet or a visually similar amount of bedding material.With this, we wanted to ensure that the patterns were perceived before the US was presented.
Apart from that, we used the same 10 day schedule as in experiment 6. Procedures were conducted in the husbandry room (for more details see experiment 6), and the cleaning procedure remained also the same as in experiment 6.

Results
In the baseline test, there was a significant preference found for the left half but not for pattern or procedure compartment.(Note that the setup here in comparison to the last experiment had opaque walls, and therefore, should have excluded environmental effects too an even higher degree.)We reasoned that an effective conditioning should erase the side preference and thus continued with the test without additional changes.
However, in the final preference test, the significant side preference was still apparent (see Fig. 9, left: 56.38 ± 7.38 %; p = 0.01224, t = 2.9925), while there was no pattern preference (horizontal stripes: 50.10 ± 9.95 %; p = 0.9718, t = -0.036123)and only a tendency towards procedure preference (millet: 54.83 ± 8.57 %; p = 0.07697, t = 1.9511).Comparing the results from the baseline test (pre-conditioning) with the final preference test (post-conditioning), the preference did not change for any of the factors (including procedure compartment).

Experiment 8
Procedure Until now, all CPP experiments were designed as forward conditioning, meaning first the CS was presented and then the US (except for experiment 1, were the CS was presented both before and after the US).By first presenting the US and then the CS, backward conditioning is also possible [13,24,25,55,56].This was aimed for in experiment 8.
We used group 3 (n = 12), setup 3 (opaque walls and compartments separated by a wall either with or without a hole) and a 10 day schedule.Due to the altered US -CS timing, the US procedures were adapted: As a supposedly positive procedure (millet), mice were placed individually in a bedding filled cage, into which immediately 0.1 g millet were given.Mice had 1 min to consume the millet, before they were taken out of the cage again.As a supposedly negative, stressful procedure (restrainer, [33,57]), mice were placed in a bedding filled cage and then immediately transferred the mice into a restrainer, in which they had to stay for 1 min.
Because the mice obviously did not learn any associations with the CS in the experiment before, we used the same visual CS (horizontal or vertical stripes).However, we made sure that the formerly positive paired CS (millet) for each mouse was now paired with the negative, stressful procedure. .Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or bedding, CS: vertical or horizontal stripes as visual pattern.
Experimental procedures were conducted in the husbandry room (for more details see experiment 6).Mice were directly taken out of their home cage and placed in the cage in which the procedure (millet or restrainer) took place.After the procedure conduction, mice were transferred into the conditioning setup, in which they stayed for 3 min, before they were taken out and returned to their home cage.
Each mouse had its own cage with bedding material for the experimental method, and cages were cleaned with 70 % ethanol and filled with new bedding material between experimental days.Between experimental days, the restrainer was cleaned with special tissues free from alcohol and aldehydes (Microbac Tissues, Hartmann, Germany) and washed afterwards with water.

Results
In the baseline test, there was no significant preference found for side, pattern or the procedure compartment (see Table 3).In the final preference test, there was again no significant side, pattern or procedure compartment preference (see Fig. 10).Comparing the results from the baseline test (pre-conditioning) with the final preference test (post-conditioning), the duration of stay near the vertical pattern increased (p < 0.01, t = -3.1647).

Experiment 9
Procedure Between experiment 8 and 9, one additional experiment took place (not based on CPP, see Supplements experiment 8.1).Experiment 9 was conceptualized as a repetition of experiment 8 with a different group of mice (group 4, n = 12), which were fully whiskered (in contrast to group 3).We used setup 3 (opaque walls and compartments separated by a wall either with or without a hole) with the same visual CS (horizontal .Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or restrainer, CS: vertical or horizontal stripes as visual pattern.
or vertical stripes).Moreover, an additional CS was added to the floor to potentially increase the conditioning effect ( [19], tactile cues might overshadow spatial cues: [3]).
For this, plates with tactile structures were used.The tactile structures were based on the study by [48] with an 8-arm maze.
The procedures used as an US (millet as a supposedly positive procedure and restraint as a supposedly negative procedure) were only slightly altered: This time, mice were placed in a type II cage (LWH: 225 x 167 x 140 mm, Tecniplast) without bedding.The floor was covered with a white plate with protruding bars or dots (3D printed, PLA).
Mice were placed inside the procedure environment (type II cage) by tunnel handling.Immediately after entering the cage, either 0.1 g millet were added or the mice were transferred into a restrainer.After 1 min, the mouse was either taken out of the cage by tunnel handling and placed into the conditioning setup (millet), or the restrainer was placed into the conditioning setup and opened to release the mice (restrainer).
The conduction of the procedure (US) was video recorded from the side, to record the time the mice took to exit the handling tunnel and enter the cage (for more details on the video analysis see Supplements).This was done to investigate whether mice formed an association between the tactile floor cues and the procedure.If so, we expected mice to show a greater hesitancy to leave the handling tunnel onto the floor combined with the negative procedure, compared to the positive procedure.
As mice were unfamiliar with the setup, we had 4 consecutive days of habituation, in which mice were habituated to the procedure environment and the conditioning setup: During these habituation sessions, the procedure environment (type II cage) contained a smooth white plate without additional tactile structures, while the conditioning setup contained smooth black plates without additional tactile structures and no visual wall patterns (plain grey walls).The habituation took place in the morning after the onset of the light.During the whole procedure, the filter top was taken off the home cage, which is why before the start of the procedure mice had 10 min to adjust to the altered 28/44 illumination.Mice were placed individually for 3 min into the procedure environment and then 3 min into the conditioning setup.Afterwards, they were returned to their home cage.To reduce the time for habituation, we interlaced the procedures, i.e., while one mouse was habituating to the conditioning setup, another mouse was habituating to the procedure environment.Habituation to the millet was not necessary as mice experienced millet as part of an active enrichment in their home cage (see section Housing).
The procedure environment and conditioning setup were cleaned with 70 % ethanol before or after each experimental (and habituation) day.Between mice, procedure environment and conditioning setup were only cleaned if faeces or urine were present, and cleaning was done just with water.Between experimental days, the restrainer was cleaned with special tissues free from alcohol and aldehydes (Microbac Tissues, Hartmann, Germany) and afterwards washed with water.

Results
In the baseline test, there was a significant preference for side (left: 56.65 ± 10.42 %; p < 0.05, t = 2.2106), although we used opaque walls and similar light conditions.We argued that this preference should be overcome by a successful conditioning and continued with this setup.
In the final preference test, there was a tendency towards a preference for the restrainer procedure compartment (see Fig. 11, 54.63 ± 8.70 %; p = 0.09202, t = -1.8456).However, if one compares the results from pre-and post-conditioning, it becomes obvious that there is merely a reduction of standard deviation but no significant increase of preference.Instead, the side preference became more pronounced (left: 56.82 ± 6.97 %; p < 0.01, t = 3.3902).In addition, we did not find a difference between experimental procedures regarding the latency to leave the handling tunnel (for more details see Supplements). .Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or restrainer, CS: vertical or horizontal stripes as visual pattern.Note that in this experiment the visual pattern was also combined with a fixed tactile cue.

Discussion
Conditioned place preference / aversion tests can be conducted in multiple ways.Here, we aimed on developing a CPP protocol to pair an experimental procedure as US with a NS → CS to test the emotional valence of the experimental procedure.In this manner, it should be feasible to compare different experimental procedures with regard to their severity from the mice's perspective.
However, finding such a protocol proved to be difficult.As we have described above, we took several approaches, none of them leading to the desired effect.Nevertheless, we would like to emphasise some conclusions we can draw from these experiments which might be helpful for future researches.

Choice of CS
In general, stimuli are not equally effective for all kinds of US [21]; therefore, the choice of cues is very important.In the following, we will discuss the used cues; note that as conditioning did not work, they are not per se "CS", but remained NS.However, to simplify the discussion, we will nevertheless speak of CS.
In experiment 1 and 2, we chose different flooring materials (bedding and gravel) as CS.This was successfully done before in other studies [29,47,58].Flooring materials combine multiple cues, including visual, tactile and olfactory cues multiple cues which can improve CPP acquisition [20].Thus, mice should have been able to easily discriminate between the two chosen CS in this experiment -which can be confirmed as the mice showed a significant preference for one of the CS from the start.However, this pre-preference is a disadvantage as it might effect the conditioning (e.g., by working as a positive US itself).Thus, although there are some studies, like [13], which did not perform a baseline preference test, we highly recommend a baseline test to ensure that the preconditions are the same.In addition, it should be analysed before the beginning of the conditioning sessions, to be able to change the CS and start again.In general, of course, a preference for one of the CS can also develop during conditioning sessions, independently from the effect of the US.Especially if the mice experience one material as more interesting than the other, a preference formation for one of the CS might be facilitated, as was the case here using flooring material which can be manipulated by the mice (e.g., gnawing, digging, see Supplements for observations during conditioning sessions).Thus, although flooring materials such as bedding material or gravel provide multiple useful cues, we would not recommend them for future experiments.
In experiment 3 and 4, metal plates were used as CS.Especially mesh or grid floors are frequently used as CS in other studies (mesh: [11,18], grid: [17,51], mesh vs. grid: [8]).The metal plates in our experiments contained a visual and a tactile cue (grid vs. holes, similar to the protocol of [2]).As the metal was very bright, however, the visual cues did not have a high contrast.Moreover, the metal combined with the ethanol cleaning led to an unexpected additional cue: temperature.The plates became very cold after cleaning, which might have caused a difference between conditioning sessions with the same plate, depending on how much time passed between cleaning and entering of the next mouse.To prevent this, in experiment 4 we set a fixed time between the two procedures.However, there might still have been a difference in temperature between the plates.In this case, the temperature of the plates might interfere with the conditioning, as it was shown that different ambient temperatures can effect the formation of place preference [59].It is unclear, how other studies handled this cleaning-temperature-problem as, for example, in the protocol by [2], in which also metal plates are recommended, there is no instruction regarding cleaning at all.As a consequence, in the subsequent experiments visual cues as described in the protocol by [2] (experiment 5) or similar to [12,13] (experiment 7 to 9) were used.The latter patterns were chosen especially because [44] showed that even rather old mice are capable of distinguishing these patterns.In experiment 6, we used a new, self-designed set of patterns.We took special care that they would contain equal amounts of black and white squares without being too similar.However, as it was self-designed pattern without reference from any other study, and there was no distinct result in preference, we can not say if the mice were able to discriminate between the two patterns.If mice could not distinguish the cues, however, this might be a factor causing the resulting lack of preference (for cue or paired US compartment).Thus, using stimuli which cause no baseline preference is important -but it is equally important to ensure that the mice are able to distinguish (and memorize) the two patterns.
In general, the provided CS should match the abilities of the mice.In several of our experiments, some (experiment 5 and 6) or all mice (experiment 7 and 8) had reduced whiskers, and in these experiments, we took care to use visual cues instead of tactile ones (see also section Whisker-loss).
In addition, it is possible that mice perceived cues as important that were not intended.For example, a part of the experimental procedure itself might have been experienced as a more prominent CS which overshadowed other CS (for more see section Choice of US).Also the cleaning method might have worked as an olfactory cue.In experiments 1 -5, we cleaned the setup with ethanol between mice to erase any possible olfactory cues from previous mice, and thereby, also any potential influence from previous mice ( [60], rats: [61]).However, the ethanol odour might have had an influence on habituation and stress (see section Habituation).Moreover, the strength of the remaining smell might differ between sessions.For this reason, in experiments 6 -9, it was an important part of the procedure that the mice would consume the millet.To facilitate this behaviour, we refrained from disinfection between mice and just cleaned with water when necessary.It would have been helpful to have recommendations from other studies, how cleaning is handled.But at least in the field of CPP studies, the cleaning procedure is usually not reported (examples: [2,16], exception: [62] recommends cleaning between animals with water and the soap used for cleaning of the home cages).Especially as, in contrast to our experiments, these published studies were successful in their conditioning, the missing information on cleaning states a problem.

Choice of US
The experiments we conducted here were preliminary and served to develop a protocol to use CPP for severity assessment.In other words: In contrast to other experiments, we did not want to know the effect of a specific US (if it is aversive or attractive), but had to choose an US with a predictive effect to validate our protocol.For example, we used restraint or fixation as a negative US, weighing as a neutral US, or a food reward (millet) as a positive US.In this manner, we investigated whether the experiment protocol itself worked, before moving on to comparisons of procedures with uncertain outcomes.
However, some tested experimental procedures seemed not to be sufficient with regard to successful conditioning and might therefore yielded no measurable effect.For example, in experiment 5, we used almond milk as a positive reinforcer.We knew from other studies in our research group ( [52], also [37]: presentation of fluids in the home cage) that mice prefer almond milk over other fluids and are also willing to work for the access to it.However, during conditioning sessions, not all mice tested the fluid and only one mouse actually drank from it.This might have been related to insufficient habituation (see section Habituation), meaning that mice referred from testing the fluid in an environment differing from their home cage.As a result, without experiencing the US, pairing of US and CS was probably unsuccessful.For this reason, we repeated this experiment with improved habituation and a different reward (millet) (experiment 6), as we knew from pre-tests that mice were more ready to consume millet in a new environment.
Another point is that US and CS might influence each other.For example, in experiment 1 and 2, the US (fixation) contained a strong cue itself: the grid which was used to perform the fixation procedure.It is possible that the grid blocked or overshadowed any other cue (including the chosen CS).Blocking occurs if animals learn first the association between one stimulus and a consequence (e.g., grid and fixation), before this first stimulus is accompanied by a second stimulus (e.g., the chosen CS, in experiment 1 and 2: bedding material or gravel).The new stimulus (the chosen CS) will then be perceived as adding no new information and, therefore, conditioning for this new stimulus will be blocked [63].Overshadowing, on the other hand, describes the effect of two simultaneously presented stimuli (e.g., grid and chosen CS) about each of which the animal learns less than if they had been presented alone.Thus, the conditioning response during the test in which only one stimulus is presented (e.g., flooring material as chosen CS) would turn out smaller [63].In this case, it might be advisable to use no second CS but a cue that is already present during the procedure, e.g., use a surface for fixation that is then also present in the CPP setup.
It has also to be considered that the US we chose for our experiments might have been to weak in their effect, causing no distinct preference or aversion.As explained in the introduction, we refrained from using more severe experimental procedures as US as we wanted to be able to detect subtle changes in severity, not only large ones.Still, after the conduction of several experiments without the expected results, we have to consider that CPP experiments might not be suitable for the measurement of mild effects.In the review of [64], the authors discuss what is actually conditioned in a CPP.One of the theories states that mice show some kind of "superstitious behaviour" after conditioning because they want to repeat the behaviour (i.e., being close to the respective CS) which lastly caused the positive reinforcement [64].Thus, using strong reinforcers might cause a stronger behavioural response.In our case, with a weaker reinforcer, the pull towards the CS might also be weaker, and as a result, might cause not the expected distinct differences in duration of stay in the two compartments.
Possible alterations to potentially increase the effect of the US could be, e.g., prolonging the time the mice spent in the restrainer or conducting food deprivation before millet presentation.As CPP protocols using experimental procedures are hard to find, we can only refer to similar studies, which, however, also differ in their approach.For example, to measure the effect of wheel running with a CPP test, some studies used water deprivation [12,13], while others did not (golden hamsters: [14]).
In general, it is also possible that mice might learned an association between the US and some unintentional, unknown stimulus.This theory is based on behaviours by the mice which we observed especially during the first experiments: Mice were more hesitant to leave the handling tunnel right before the conduction of the (negative) US.This observation is described in the Supplements and will be discussed in section Latency to Leave Tunnel.

Choice of Setup
In the course of this study, we used three different setups.Setup 1 consisted of two cages connected by a tube with an automatic system detecting the position.As we used a similar connection system for the home cages, we expected the mice to readily accept this setup.However, both experiments using this setup (experiment 1 and 2), one mouse (the same for both experiments) changed cages during the baseline test but not during the preference test.Thus, we had to exclude the mouse because we could not determine whether the mouse was aware of both options.It's unclear why the mouse behaved in this way, and if, for example, this was a sign of stress.

32/44
The next setup we tried, consisted of one cage separated into two halves by a small barrier (experiments 3 -6).This setup was based on the setup used by [2].The barrier was added a) to have a visual segregation of the compartments for the mice, and b) to facilitate the analysis (determining to which half of the cage the mice belonged).In this setup mice changed compartments more often.Indeed, one might consider the question whether the mice actually received the setup not as two compartments but as one large compartment (with an obstacle in the middle).This could imply that they did not experience the situation as providing a choice.In addition, without a visual barrier between the compartments, it can be argued that wherever the mice were staying, they still had (visual) contact with the other compartment: In experiment 3 and 4 we used tactile and visual cues (metal plates), while in experiment 5 and 6, using the same setup, we had visual cues only (patterns).As a result of conditioning, mice should seek out the CS which they associate with the preferred US [64].However, in this setup, mice might not need to stay in one specific compartment to do so -because the cue (pattern) is visible from both compartments.This might not have been essential for other CPP studies (otherwise [2] would probably not recommend this setup) but here, with a potentially weak US (as discussed in the previous section), the behavioural response to be as close as possible might be weaker.Following this line of thought, we reduced the (visual) contact with the CS of the second compartment in setup 3.This setup resembled the description of [12] and [17].
Note that setup 1 and 2 both had transparent walls, which enables mice to see external visual (spatial) cues.During the course of the experiments, care was taken not to change any external cues which were visible for the mice.In setup 1, external visual cues should not have played an important role as we used a one compartment design during conditioning sessions (argumentation see [3] and [19]).In setup 2, however, these external cues might also have an influence.Interestingly, in 2 out of the 3 experiment in setup 3, there were side (compartment) preferences in the baseline and preference test.Setup 3 had all non-transparent walls and external visual (spatial) cues were excluded.Thus, potential cues were restricted to the ceiling (more than 2 m above the setup), which contained a symmetric metal structure but no direct lights.It is possible that this structure was not completely symmetrically above the setup.Otherwise, it is also possible that the non-transparent walls caused a difference in lightning of the compartments: Although we took special care to position the setup exactly between the light sources, we did not measure the brightness.If the brightness actually did cause the side preference, it also remains unclear, why in experiment 8, which was exactly positioned the way as experiment 7, this side preference was not observed.
As it can be seen in Table 1, many studies use setups with 3 compartments instead of 2. The advantage of a 3-compartment-setup is that the animal can be placed into a neutral area at the start of the test.The disadvantage, however, could be that the animal spends too much time in this neutral area throughout the test.As we wanted to develop a protocol, which would be suitable also to compare the effect of two aversive procedures (instead of comparing one of them with a neutral control, as is done in most studies), a neutral area would have resembled a refuge in which mice could refrain from making a choice.For this reason, we did not test a 3-compartment-setup.

Habituation
In the course of the experiments, we found habituation to be an important tool.As a first example: In pre-tests between experiment 5 and 6, we observed that a thorough habituation to the US environment was needed for a reward to be used as a US.
Otherwise the mice did not consume the reward (millet or almond milk).
As a second example: In many experiments, we noticed a high defecation and urination rate.This is assumed to be a sign of distress or fear [35,50].As this also happened during baseline tests (i.e., without presentation of a negative US), this seems to be setup and / or procedure related.Our aim was to find a protocol for CPP to measure the severity of specific experimental procedures.Thus, a protocol used for measurement (the CPP) which itself causes distress or fear should be reconsidered.
Unfortunately, observations on urination and defecation during the procedure are not described in other studies so we cannot compare these observations.However, we assume that this is due to report customs, not because this behaviour was not shown.
In general, any sign of stress should not be underestimated as stress could influence conditioning: For example, acute restraint stress can influence behavioural flexibility such as reversal learning [65] and the retrieval of short-time and long-time memory [66], which is mandatory for conditioning.The stress levels of our experiment should be relatively low compared to acute restraint stress, e.g., we had 1 min in a restrainer instead of several minutes up to an hour.Still, we do not know at which stress level the disturbance of memory processes starts.Habituation to the setup and the procedure could be an easy way to reduce stress during the actual CPP procedure.
In general, habituation (or repetition of the same procedure) improves the reliability of behavioural data [67] -in this case, this include the behaviour during baseline and final test as well as during conditioning sessions.In other words, the first time the mice are placed into the setup should perhaps not be the baseline recording.If mice are already familiar with the setup and the procedure, novelty induced variance might be reduced [67], resulting in more profound data.
However, for habituation in a CPP experiment, two things have to be considered: First, the question arises whether it is possible to "over habituate" the mice.We would argue that this is not possible, as long as the US and CS stimuli remain unknown before (in the same manner as in the novel object recognition test, example protocol: [68].Instead we can assume that the more familiar the mice are with all other stimuli, the more attentive they should become towards a change. Second, we can expect that habituation is most effective when setup and procedure (handling, timing, environment etc.) resemble the actual CPP protocol as much as possible.Of course, this does not mean that the to be tested CS or US should be familiar to the mice before the beginning of the CPP.Instead, the setup can be used in a similar procedure.For example, in experiment 9, we used plates with tactile structure and wall patterns as visual stimuli during the CPP as CS.During the habituation, we used similar plates but they were smooth, without additional tactile stimuli, and there were no patterns on the walls (i.e., grey walls instead of ones with black and white stripes).We had 4 days of habituation, in which also the procedure was mimicked, meaning we first placed the mice into the experimental environment for several minutes and then into the conditioning setup.This seemed to be quite efficient as in the baseline test (with the "real" tactile and visual stimuli), only 3 out of 12 mice urinated or defecated.In comparison: In experiment 4, we had habituation sessions, but they were short and not in the experimental room in which the CPP took place later on.Thus, we had not a different environment and also a differing procedure (no transport beforehand).In this experiment, 10 out of 12 mice urinated or defecated during baseline recording.

Timing of US and CS
Regarding the presentation of US and CS, it has to be decided whether to present the CS before, simultaneously (both considered to be forward conditioning) or after the US (backward conditioning).
In experiment 1, we presented the stimuli before and afterwards with the argumentation that we wanted to be "sure" to get an effect.This is, however, misleading because presentation of the CS before or after the US can have opposite effects.For example, in a study with rats, [13] presented one CS before the US (fluid, taste conditioning) and one CS after the US (pattern, place conditioning).They used wheel running as a US, which led to an aversion of the CS presented before the US (taste aversion) but a preference for the CS presented after the US (place preference; for more examples see Introduction).In the study, the authors do not give an explanation for this phenomenon, nevertheless, the opponent-process theory could provide a sufficient explanation.In short, a positive US, which has an attractive effect before its onset, can have an aversive effect after its presentation due to the removal of the positive stimulation (and the other way around for negative US, [69]).Thus, if presenting the CS before the US leads to place preference, and presenting it afterwards to an aversion, the opposite effects might cancel each other out.
Forward conditioning is the more common procedure, whereby presenting first the CS and than the US (as we did in experiment 2 -4) is not as common as presenting them simultaneously (see also Table 1).In experiment 5 and 6 we tested this simultaneous presentation of US and CS.However, especially in experiment 6, presentation of the US (in this case millet) did not work as expected: When placing the mice inside the conditioning cage, they often immediately started feeding and only later explored the cage (observations during conditioning are described in the Supplements).It is possible that the mice still perceived the cues during feeding.However, it might still have been after the perception of the odour of the millet.Thus, the odour might have blocked the patterns as a CS ( [70], also explained in section Choice of US).
To circumvent this, in experiment 7, millet was placed inside the cage with a delay.This approach was inspired by the study of [53], in which conditioned aversion was more prominent if mice experienced first the compartment with the CS and then the onset of the US (in their experiment: water flooding).However, as we only had a slight but not sufficient tendency after conditioning for a procedure preference, the timing might still not have been chosen sufficiently.We had 30 s until the US (millet or bedding) was placed in the compartment and 2.5 min afterwards for the mice to consume the millet (if present).Thus, mice were in contact with the CS for some time, while the US was already removed (eaten).It is possible that this time span was to long, and the "simultaneous" presentation might not have worked properly.
For experiment 8 and 9, we used a different approach and changed to backwards conditioning.However, unlike [13,24,25,56] which also used backwards conditioning, we were not able to establish a conditioned place preference or aversion.The main differences between the mentioned experiments and ours are: First, we used mice instead of rats.It is possible that perception and learning differs between those species in ways that might be important here.Second, our procedure times were much shorter.The US (millet or restrainer) took only 1 min, and we confined the mice to the conditioning compartment for 3 min.In the study conducted of [24], 2 h of wheel running were followed by 30 min in the conditioning compartment, [56] had 2.5 h of wheel running with a fixed interval schedule followed by 30 min in the compartment, [13] had 30 min of wheel running followed by 15 min in the compartment, and [25] used 2 or 22 h of wheel running and 30 min in the compartment.All in all, these time frames are much longer than what we used.This might only be necessary for wheel running but not for other US.In addition, confiding the animals for 30 min into a small chamber might itself be stressful, not to mention the separation from the group.Especially, if we want to use conditioned place preference for severity assessment, it seems not feasible to use a protocol which itself might already influence the affective state of the animal.

Timing of the General Procedure
As already mentioned in the Introduction, few studies report details on the procedure not specifically related to the conditioning.For example: At what time of day does the conditioning exactly take place?What time difference lies between the first mouse and the last?Is there a transport to a different room?If one mouse is taken out of the home cage, how long before the other mice of its group are taken out, and thus, how long does the overall disruption for the group last?
During our experiments, we noticed that these factors might also play an important role.For example, while the first mice (order was randomized) were always active, the last mice sometimes had to be woken up.This might have influenced their perception.
In all experiments, we used large groups of mice living together in one cage.With this, we can argue that we had no difference of treatment between cages (in comparison to other studies as we had only one), for example, with regard to the length of disruption due to the experiment (transport and stay in the experimental room, duration of removal of the filter top and light changes and so on).On the other hand, the overall disruption of the circadian rhythm might have been larger with this procedure because in total, the time the home cage conditions were affected by the experimental procedure was of course longer.

Transport
In the course of experiments, we learned that conducting the procedures in an experimental room (not the husbandry room) has to be considered carefully.First, habituation to the room conditions could never be as profound as for the husbandry room.Second, the transport itself includes several stimuli which might influence the perception of later presented stimuli.Thus, either the transport itself (and everything related to it) needs habituation to become an irrelevant stimulus, or the experiment should best take place in the room in which the animals are kept.Keeping the animals in the experimental room some days before the start of the experiment and during its conduction could be considered.

Sound
Some studies, including the protocol by [23], use sound attenuated chambers.Most studies, however, do not report whether they use them or not.For our experiments, no sound attenuated chambers were used.Thus, it is possible that sounds from other mice (in the home cage or other mouse groups kept in the same room) might have influenced the results.Experiments 6 -9 were conducted in the husbandry room.In experiment 6 and 9, the group participating in the CPP experiment was the only group kept in this room.However, in experiment 7 and 8, a second group (group 2 from previous CPP experiments) was present and involved in a home cage based consumer demand test.This involved motor noises from an automatic door.However, we would argue that the mice should have been thoroughly habituated to these noises, and therefore, they should not have functioned as a new stimulus or a stimulus with relevant information on the procedure.

Whisker-loss
Barbering of fur and/or whiskers is a known, chronic problem in C57BL/6J mice [71].Whisker-loss can lead to altered behaviour, e.g., in the novel object recognition, marble burying and the open field test [40,41].In addition, the barbering mice are seen as a model for the disorder trichotillomania.However, it was shown that these mice show no reduced learning ability with the exception of a extra dimensional shift task [42], which is why they still should be suitable for conditioning.
Unfortunately, reporting levels of barbering are very low when it comes to studies unrelated to the investigation of this behaviour.It is therefore unknown how many 36/44 study results are influenced by it.We here openly report that some of our mice groups showed whisker-loss.In these experiments, visual (not tactile) patterns were used as CS.We argue that in the studies mentioned above, in which altered behaviour was found, the tactile information was crucial.In our case, with our changed setup, on the other hand, mice should have been able to be conditioned even with missing tactile information.
In addition, we repeated experiment 8, which used a group of mice with whisker-loss, in experiment 9 with a group of mice with intact whiskers.For experiment 8, we had visual stimuli only, for experiment 9 the same visual stimuli and additional tactile cues.However, in both experiments, no procedure preference was found.This shows that the whisker-loss per se does not make the difference in the results.

Latency to Leave Tunnel
In experiment 4 and 9 (and 8.1, no CPP experiment, see Supplements), in addition to the CPP, we measured the latency of the mice to leave the handling tunnel (details in the Supplements).Already in experiments 1 -3, we had observed that with progressing conditioning sessions mice took more time to leave the tunnel when confronted with the environment in which they were fixated (experiment 1 and 2: grid, experiment 3: an upside-down cage) than when confronted with the environment in which they were weighed (experiment 1 and 2: glass jar, experiment 3: cage).Here, the procedure environment seemed to work as a CS on its own.However, in experiments 1 and 2, the latency to leave the handling tunnel was not precisely measured.Instead notes were taken.In experiment 3 (as described in the Supplements), the latency was only measured during the last conditioning session.Thus, no comparison to a start latency and a potential bias between the different surfaces is possible.
In experiment 4, time was measured accurately.However, there was a great difference between the used environments right from the start (cage with bedding vs. glass jar), so the data was not comparable.In experiment 8.1 (see Supplements, no CPP experiment), we performed an experiment only focussing on this latency to leave the tunnel.Here, we used visual in addition to tactile stimuli (because this group of mice had no whiskers) and compared access to millet with restraint in a self-made restrainer.No difference in the latency to leave the tunnel was found.
Arguing that this could be due to the whisker-loss, we included this measurement again in experiment 9. Here, to distinguish the experimental environment only tactile stimuli were applied.Again, no difference was found.This could be due to the chosen procedure (restraint), in which the mouse was not in direct contact with the tactile stimulus but with the restrainer tube.Nevertheless, to get the mice into the restrainer, they had to be guided or sometimes chased into the tube, which itself might have worked as an aversive US, and this happened in direct contact with the tactile stimulus.
This missing association between procedure environment and procedure as seen in experiment 8.1 and 9 is very interesting.It shows that although sometimes (experiment 1 -3) an association between procedure environment and procedure is formed, and causes mice to leave the tunnel more hesitantly, sometimes it is not.What causes the difference?In experiment 1 -3 we used fixation by hand, in 8.1 and 9 restraint in a restrainer.Is it necessary to have an US with a more negative emotional effect to induce this form of conditioning?If so, could we now say that a short fixation for 20 s was therefore more severe than staying in a restrainer for 1 min?Or is it the procedure itself which facilitates the association because during fixation the mice have to be pressed shortly against the ground, and therefore are forced to have a more direct contact with the environmental stimuli?Unfortunately, these are questions which we are not able to answer with our experiments but will have to pass them on for future studies.

Fig 1 .
Fig 1. Setup 1 and 2 used during the experiments.Setup 1 and 2 used during the experiments.(A) Setup 1: Two cages connected via a tunnel, used during experiments 1 -2 with either bedding material or different types of gravel as conditioned stimuli (CS).The picture is a screenshot from the videos which were used to get the start time.(B) Setup 2 -barrier and (C) Setup 2 -plate: One cage separated by a barrier, used during experiments 3 -6 with either different plates or floor patterns as CS.

Fig 2 .
Fig 2. Setup 3 used during the experiments.Two compartments are separated by a wall, used during experiments 7 -9 with different wall patterns as conditioned stimuli (CS).(A) View from above, picture taken as a screenshot from the video recordings used for analysis.(B) Front view and (C) side view of the setup.In experiment 9, the separator between the two compartments did not contain an open space and the floor was covered with additional 3D printed plates.In addition, the cage seen in (B) and (C) also belongs to experiment 9 as procedure environment.Here, the mice experienced millet or restraint in a restrainer.
90 s after cleaning (directly after cleaning: minimum temperature about 17.0 • C, area average about 19.0 • C; after 90 s: minimum temperature about 20.0 • C, area average about 20.5 • C).This was the reason for implementing a short pause between cleaning an insertion of the next mouse during experiment 4.

Setup 2 (
experiment 3 -6): Time spent in each component of the setup was recorded manually (experiments 3 and 4) or with the help of the open source program BORIS (experiments 5 -6, Behavioral Observation Research Interactive Software, Version 7.9.8,

Fig 3 .
Fig 3. Duration of stay of experiment 1 (n = 12).Depicted is the time spent (in percent) on the bedding paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: fixation or weighing, CS: comfort white or pure bedding material.

Fig 4 .
Fig 4. Duration of stay of experiment 2 (n = 11).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: fixation or weighing, CS: marble or quartz gravel.

Fig 5 .
Fig 5. Duration of stay of experiment 3 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: fixation or weighing, CS: holes ot slits metal flooring plates.

Fig 6 .
Fig 6.Duration of stay of experiment 4 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or weighing, CS: holes vs. slits metal flooring plates.

Fig 7 .
Fig 7. Duration of stay of experiment 5 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: almond milk or water, CS: dots or stripes visual pattern.

Fig 8 .
Fig 8. Duration of stay of experiment 6 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet with bedding or only bedding, CS: chessboard or fabric-like visual pattern.

Fig 9 .
Fig 9. Duration of stay of experiment 7 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or bedding, CS: vertical or horizontal stripes as visual pattern.

Fig 10 .
Fig 10.Duration of stay of experiment 8 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or restrainer, CS: vertical or horizontal stripes as visual pattern.

Fig 11 .
Fig 11.Duration of stay of experiment 9 (n = 12).Depicted is the time spent (in percent) on the pattern paired with a specific procedure (US).For a better visual impression, we split the results with regard to cue combination (CS -US pairing) and trial: In the habituation (= baseline) trial the initial preference pre-conditioning can be seen, whereas in the test trial, the post-conditioning preference can be seen.Thus, with successful conditioning, both subsets of cue combinations should show a decrease or increase from baseline to test.Dotted lines represent chance level (50 %).US: millet or restrainer, CS: vertical or horizontal stripes as visual pattern.Note that in this experiment the visual pattern was also combined with a fixed tactile cue.

Table 2 .
Summary of the CPP procedures used in experiment 1 to 9.