Adaptive, maladaptive, neutral, or absent plasticity: Hidden caveats of reaction norms

Abstract Adaptive phenotypic plasticity may improve the response of individuals when faced with new environmental conditions. Typically, empirical evidence for plasticity is based on phenotypic reaction norms obtained in reciprocal transplant experiments. In such experiments, individuals from their native environment are transplanted into a different environment, and a number of trait values, potentially implicated in individuals' response to the new environment, are measured. However, the interpretations of reaction norms may differ depending on the nature of the assessed traits, which may not be known beforehand. For example, for traits that contribute to local adaptation, adaptive plasticity implies nonzero slopes of reaction norms. By contrast, for traits that are correlated to fitness, high tolerance to different environments (possibly due to adaptive plasticity in traits that contribute to adaptation) may, instead, result in flat reaction norms. Here we investigate reaction norms for adaptive versus fitness‐correlated traits and how they may affect the conclusions regarding the contribution of plasticity. To this end, we first simulate range expansion along an environmental gradient where plasticity evolves to different values locally and then perform reciprocal transplant experiments in silico. We show that reaction norms alone cannot inform us whether the assessed trait exhibits locally adaptive, maladaptive, neutral, or no plasticity, without any additional knowledge of the traits assessed and species' biology. We use the insights from the model to analyse and interpret empirical data from reciprocal transplant experiments involving the marine isopod Idotea balthica sampled from two geographical locations with different salinities, concluding that the low‐salinity population likely has reduced adaptive plasticity relative to the high‐salinity population. Overall, we conclude that, when interpreting results from reciprocal transplant experiments, it is necessary to consider whether traits assessed are locally adaptive with respect to the environmental variable accounted for in the experiments or correlated to fitness.


| INTRODUC TI ON
Environmental conditions vary in time and space, and this can have variable consequences on species' ranges and persistence (Bonier et al., 2007;Melbourne-Thomas et al., 2021;Rafajlović et al., 2022aRafajlović et al., , 2022b. In general, the ability of organisms to survive and reproduce under a wide range of environmental conditions is considered to be of crucial importance for the long-term persistence of populations, not least under ongoing global climate change, which affects ecosystems on many levels (Brown et al., 2016;Foden et al., 2019;Parmesan, 2006;Reusch, 2014). One way for individuals to successfully cope with temporally and/or spatially changing environments is through locally adaptive phenotypic plasticity (Bay et al., 2017;Chevin & Lande, 2011;Enbody et al., 2021;Johansson et al., 2017;Lande, 2009;Levis & Pfennig, 2016;Noer et al., 2022;Pazzaglia et al., 2021;Reusch, 2014). Phenotypic plasticity is defined as the capacity of a single genotype to give rise to different phenotypes depending on the environment (Bradshaw, 1965), and it can be locally maladaptive, neutral, or locally adaptive, resulting in negative, no, or positive consequences on individuals' fitness, that is, the ability to pass on genetic material to the next generation (Chevin et al., 2010;Ghalambor et al., 2007;Storz & Scott, 2021;Velotta et al., 2018).  (Delgado et al., 2020;Ortega et al., 2017;Svensson et al., 2018;Taborsky, 2006). The difference in phenotypes with respect to the environmental conditions is referred to as a reaction norm (Chevin et al., 2010). For individuals that express plasticity in an adaptive trait, we expect reaction norms in that trait to have significant nonzero slopes when the optimal phenotype in the two environments differ. Note that an adaptive trait is such that a change of value in this trait toward (or away from) the locally optimal phenotype increases (or decreases) individuals' fitness. However, highly adaptive plasticity is expected to 'optimize' individuals' fitness in environments within the species' niche (Holt, 2003), that is, to keep the fitness approximately constant across environments. Therefore, if adaptive plasticity in traits that are under environmental selection is sufficiently high, it is expected that other traits that are strongly correlated to fitness (and yet selectively neutral with respect to the environment) would have essentially unchanged phenotypes in stressful environments (Reusch, 2014). The most obvious example of such a trait would be fitness itself (De Vienne, 2021).
However, to accurately estimate fitness empirically is challenging (Alif et al., 2022;Reid et al., 2019). Fitness is particularly difficult to estimate in short-term experiments, which last shorter than the generation-time of an individual. In such cases, it is common to rely on indirect fitness proxies such as short-term survival, individual growth rates, or individual performance in traits that are assumed to be critical for survival (Johansson et al., 2017;Rugiu et al., 2021;Wood et al., 2014), although these could, in fact, be poor indicators of long-term fitness (Lailvaux et al., 2010; see also a discussion in Bonser, 2021). In this study, we use the term fitness-indicator traits to refer to traits that are indicators of true fitness.
The adaptive maintenance of functional phenotypes in different environments is referred to as phenotypic buffering (Bradshaw, 1965).
As emphasized by Reusch (2014), phenotypic buffering complicates the interpretation of reaction norms with respect to the involvement of locally adaptive plasticity. For example, reaction norms for fitnessindicator traits undergoing phenotypic buffering are expected to be flat (i.e., absence of significantly nonzero slopes), according to the definition above, which suggests an absence of plasticity in the fitness-indicator traits. Therefore, one may incorrectly conclude that because plasticity in the measured trait is absent, adaptive plasticity is absent, although plasticity in some other (unobserved) adaptive trait may, in fact, be very high. While this problem has been verbally pointed out by Reusch (2014), it has not, to our knowledge, been formally theoretically studied.
To illustrate how reciprocal transplant experiments can lead to different conclusions depending on what trait is measured, we consider the isopod Idotea balthica (Pallas, 1772), which successfully colonized the brackish-water Baltic Sea (refer to Figure 1 in Björck (1995) for a map of the region) from the marine environment (Leidenberger et al., 2012) after the opening of the sea about 8000 years ago (Björck, 1995). In this paper, we exclude the transition zone, sensu Snoeijs-Leijonmalm et al. (2017), consisting of F I G U R E 1 Spatial variation in the local average nonplastic component of the phenotype (red), the locally optimal adaptive phenotype (black dashed line) and local average plasticity (blue) at the end of the range expansion simulation in dependence of deme number. To better visualize the variability in the nonplastic component of the phenotype, the y-axis for the locally optimal adaptive phenotype and nonplastic component is truncated at ± 100. The habitat consisted of M = 220 demes, numbered from −109 to 110. The rings and the letters A, B, C, D, and E denote the experiments and the corresponding sampling locations for the experiments listed in Table 2 (with deme A1 to the left of A2 for experiment A, and similarly for the remaining experiments). Remaining parameter values: K = 100, r m = 2, V S = 2, = 10 −6 , L = 799, = √ 1 ∕ 10, = 2 ∕ L = 0.0025, = 1, = 0.5, and = 0.5 (Table 1).
the Kattegat and the Danish Straits, in our definition of the Baltic Sea, unless otherwise stated. Outside the Baltic Sea, under typical marine salinity conditions, I. balthica individuals are iso-osmotic at a salinity of 35 practical salinity units (psu) and maintain hyperosmotic body fluids in lower salinity with the use of Na + /K + -ATPase ion pumps located in the pleopods (Postel et al., 2000). However, individuals from the typical marine environment cannot maintain homeostasis when placed in 6 psu water where they rapidly die due to the osmotic stress (Hørlyck, 1973). By contrast, I. balthica that are native to the brackish Baltic Sea conditions seem to do well in salinities down to 3 psu (Leidenberger et al., 2012). Furthermore, salinity does not have any significant effect on oxygen consumption, food consumption, growth, or short-term survival (i.e., for 12 weeks, which is much less than the generation time of the species) for adult individuals native to salinities ranging from 5 to 10 psu throughout the species' present distribution within the Baltic Sea (Wood et al., 2014). However, the survival of juveniles at the current range margin (Rugiu et al., 2021) is significantly reduced when salinity is lowered below 3 psu. Furthermore, throughout the Baltic Sea, there seems to be a reduced survival when salinity is lowered simultaneously with an increased temperature (mimicking future climate conditions; Rugiu et al., 2018). Interestingly, and in contrast to the Baltic Sea populations, a population outside the Baltic Sea, just to the north of the Öresund strait, did not exhibit any significant decrease in survival under future climate conditions with increased temperature and decreased salinity (Rugiu et al., 2018).
The finding by Wood et al. (2014), that the expression of many traits in I. balthica is uniform throughout the Baltic Sea, seems to contrast with the results by Rugiu et al. (2018Rugiu et al. ( , 2021, who found that the populations inside the Baltic Sea are more sensitive to environmental changes than populations outside the Baltic Sea (north of the Öresund). Is this because there is a hard limit for plastic responses at approximately 3 psu, or is the population southeast of the Öresund (hereafter the low-salinity population) less tolerant to salinity changes than the population in the Kattegat and Skagerrak (hereafter the high-salinity population) and mainly adapted locally through the (nonplastic) genetic component of the phenotype, or is there an issue in how the reaction norms have been interpreted? The answer might lie in the nature of the measured traits, that is, whether a trait is adaptive, or a fitness proxy (commonly assessed in short-term experiments). For example, an increase in metabolic rate could be adaptive if it would be correlated with higher food ingestion rates, growth, fecundity, and offspring survival. However, higher metabolic rates associated with less ingestion and growth would be a sign of stress and nonmaintenance of function, in which case the trait could rather be considered fitness-indicative. In the former case, a nonzero reaction norm slope for metabolic rate would imply adaptive plasticity in this trait, whereas in the latter scenario, a zero-slope reaction norm would imply an adaptive response (with adaptive plasticity in some other unobserved trait under environmental selection) that acts to maintain fitness under novel conditions. Thus, it is often important to measure multiple traits simultaneously during reciprocal transplant experiments. However, even if this is done, the connection between trait values and fitness can remain unclear unless more information about fecundity and survival is available. Notably, additional clues can be obtained from theoretical models where information about fitness and the adaptive trait is readily available and can be used to produce expectations for reaction norms, taking into account different extents of local plasticity in the adaptive trait. As we show here, such theoretical models can help interpret empirical data from reciprocal transplant experiments. This paper has two aims. Our first aim is to understand when and how measurements of different traits result in different interpretations of reaction norms. To this end, we simulate reciprocal transplant experiments on simulated data. Importantly, we do not constrain our analysis to strictly 'optimal' phenotypic buffering sensu Reusch (2014). Instead, our simulated reciprocal transplant experiments account for population pairs sampled across a wide range of environmental conditions with low, intermediate, or high local adaptive plasticity. We therefore find many cases where reaction norms in fitness-indicator traits have significant nonzero slopes (despite the involvement of locally adaptive plasticity in the adaptive trait). We refer to this as 'sub-optimal phenotypic buffering' because fitness is reduced, but not as much as it would have been in the absence of the plastic response. Based on the simulated reciprocal transplant experiments, we summarize a number of caveats associated with interpreting the norms of reaction.
Our second aim is to infer whether the low-salinity population of I. balthica has higher or lower tolerance to salinity changes than the high-salinity population, and whether the colonization of the Baltic Sea has occurred mainly through the evolution of nonplastic genetic adaptation to the low salinity environment or through the evolution of increased plasticity. To this end, we first performed reciprocal transplant experiments comparing the performance of I. balthica individuals from the north and the south of the Öresund strait (see Figure 1 in Björck (1995) where the Öresund is located to the east of Skåne in southern Sweden) in salinities of 16 and 8 psu, respectively.
To our knowledge, reciprocal transplant experiments spanning the Öresund have not been performed to date in this species. Following this, we applied the conclusions from our model to interpret the reaction norms obtained from the empirical data in order to infer which of the two populations is likely to perform better in a wider range of salinities than the other.

| ME THODS
To illustrate the difference in reaction norms for the different kinds of traits and for populations with high and low plasticity, we performed reciprocal transplant experiments on simulated data, where the proportions of the plastic and nonplastic phenotypic components of the adaptive trait were known. To further illustrate how the interpretation of reaction norms can be altered depending on what trait is measured, we analysed empirical data from I. balthica sampled from the north and the south of the Öresund (i.e., from highand low-salinity environments).

| Simulations of range expansion
Prior to simulating reciprocal transplant experiments, we performed simulations of a population's range expansion over a habitat with spatially heterogeneous environmental conditions using the model presented in . The model was not tailored to any specific species or habitat-specific environmental conditions. Instead, our main aim with the simulations was to create a spatially structured population suitable for illustrating how different extents of local plasticity and nonplastic local genetic adaptation can lead to different conclusions depending on what trait is measured.
Thus, a conceptually general, rather than species-specific, model was used, parameterized to allow the population to evolve a wide range of plasticity values across the habitat.
In our model, the population was assumed to expand its range across a habitat consisting of a one-dimensional array of demes. We modelled one quantitative trait under selection (i.e., the adaptive trait). The phenotypic value of the adaptive trait (hereafter the phenotype of the adaptive trait) was assumed to consist of a nonplastic and a plastic component that were allowed to evolve, and they approximately stabilized during the timespan simulated (100,000 generations). In the model, the local phenotypic optimum of the adaptive trait changed in space according to a spatially gradually steepening gradient ( Figure 1, dashed black line). Note that one of the main factors that determine the locally optimal plasticity of the phenotype is the local steepness of the environmental gradient . Therefore, by choosing an environment with a spatially steepening gradient in the optimal phenotype, we allowed for a wide range of plasticity values to arise by evolutionary processes (rather than being set arbitrarily) within a single modelling framework. With an optimal phenotype that changes linearly in space, by contrast, a narrower range of plasticity values is typically obtained ( Figure S1; . Generations were assumed discrete and nonoverlapping. The population was assumed to be monoecious and mating was assumed to occur randomly (thus, self-fertilization was possible, and it induced no fitness costs when it occurred). The individuals were diploid, and the phenotype of the adaptive trait (u (i) ,k ) of individual k in deme i in generation was assumed to be determined by a plastic (g (i) ,k ) and a nonplastic (z (i) ,k ) phenotype component  Here, the plastic component of the phenotype is assumed to be directly proportional to the locally optimal phenotype ( (i) ), and we refer to the coefficient of proportionality (g (i) ,k ) as plasticity (following . Plasticity and the nonplastic component of the phenotype of the adaptive trait were polygenic, each determined by multiple bi-allelic loci with additive allele effects (with effect sizes ± ∕ 2 for the nonplastic component of the phenotype and effect sizes ± ∕ 2 for plasticity). The effect of the different parameters on the evolution of plasticity was extensively studied in  and will not be considered in detail here. Instead, for the purpose of this study, we make use of the findings of  to parameterize the model in such a way to (1) allow the population to occupy the entire habitat (e.g., by choosing allelic effect sizes and the number of loci appropriately) and (2) to produce a spatially heterogeneous pattern in plasticity with the average plasticity ranging from approximately 0 to approximately 1 in different parts of the habitat. The specific parameter values are listed in Table 1.
In the model, the fitness (W (i) ,k ) of individual k in deme i in generation was assumed to depend on the phenotype (u (i) ,k ) of the individual, the locally optimal phenotype ( (i) ), the local population size ,k ), and (potentially costly) plasticity : In Equation (2), the term 1 − g (i) ,k stands for the relative reduction in fitness due to the direct cost of plasticity, that is, it does not include any fitness reduction due to potential maladaptive plasticity (hereafter this term is referred to as the cost-related function). The effect of maladaptive plasticity on fitness is accounted for by the last term of Equation (2). The parameters and stand for the scale and ,k (i) . (2)

Number of habitat demes (M) 220
Local carrying capacity (K) 100 Maximal intrinsic growth rate (r m ) 2 Width of stabilizing selection (V S ) 2 Mutation rate (μ) Shape parameter for cost-related function (γ) 0.5 Scale parameter for cost-related function (δ) 0.5 TA B L E 1 List of parameter values and notations used throughout shape parameter of the cost-related function, respectively (details in . Although the empirical evidence for direct costs of plasticity is scarce (Murren et al., 2015;Snell-Rood & Ehlman, 2021), some costs or limits to plasticity must inevitably exist because; otherwise, plasticity would always outcompete nonplastic adaptation (Chevin & Lande, 2011;. We note that the results would be qualitatively similar if the cost-related function was set to 1 in the model, but unreliable environmental cues for the plastic response were used (whereas environmental cues were reliable in the present model). This is because unreliable environmental cues generate an implicit cost to plasticity by causing a mismatch between the phenotype generated by plasticity and the actual phenotypic optimum of the environment during selection (Tufto, 2000). The parameters r m , K, and V S in Equation (2) denote the maximal intrinsic growth rate, the carrying capacity, and the width of stabilizing selection, respectively. All the parameters , , r m , K, and V S were assumed to be constant in time and the same for all individuals in all demes. The lifecycle was modelled as follows: Mating, recombination, mutation, formation of zygotes, death of adults, and dispersal of juveniles (details in Appendix S1; see also ). Recombination occurred freely between any pair of loci. Dispersal was assumed to occur according to a discretized Gaussian dispersal function (Eriksson & Rafajlović, 2021) with standard deviation = 1. The realized gene flow between demes was an evolving property of the model, governed by dispersal, the environmental selection, mutation, and drift.
The parameter values examined here ( Table 1) correspond to those used in  except that r m = 2 here (the maximal intrinsic growth rate was larger than in

| Simulations of reciprocal transplant experiments
To simulate reciprocal transplant experiments, we sampled individuals from local populations obtained at the end of the range-expansion simulations described above. For each reciprocal transplant experiment, two different demes with different locally optimal adaptive phenotypes (and different contributions from the nonplastic and the plastic components to the phenotype; Figure 1) were chosen in such a way that qualitatively different results from the reciprocal transplant experiments were expected ( Note: For each sampling location, the columns list the corresponding locally optimal adaptive phenotype, the average nonplastic component of the phenotype, and average plasticity of the local population, as well as the average nonplastic component and average plasticity of the chosen sample of 10 individuals (sample standard deviations within parentheses) for the individual experiments shown in Results, and the percentage of the phenotype that is determined by its plastic component (i.e., plasticity multiplied by the locally optimal adaptive phenotype).
to cover various combinations of plasticity differences in the different pairs of populations involved. Because the habitat was symmetric around the centre with respect to the horizontal axis, all samples, except sample A1, were taken from the right half of the habitat. From each deme, 10 individuals were sampled at random. Each of these individuals was assumed to have mated randomly within the local population before sampling. Since r m = 2, the number of offspring in the absence of density regulation (which is typically a good approximation for lab conditions) was large enough ( To account for stochasticity, we performed 1000 realizations for each sampling location.

| Analysis of simulation results
From each reciprocal transplant experiment, we extracted reaction norms for the adaptive trait by measuring its realized phenotype in the native and the new environment in a given experiment.
Furthermore, reaction norms were obtained for a trait that was assumed to have a trait value correlated to fitness (hereafter the fitness-indicator trait). The fitness-indicator trait was not explicitly modelled. Instead, its phenotype was assumed to be regulated by how locally well-adapted an individual is (e.g., through altered signalling pathways in new environments). Note that, as a consequence of this assumption, the fitness-indicator trait exhibits plasticity whenever fitness differs between the native and the new environment.
To simplify the analysis, we defined the phenotype of the fitnessindicator trait to be equal to the phenotype-dependent component of fitness, obtained by setting N (i) ,k = K in Equation (2), yielding The notation in Equation (3) is the same as in Equation (2), except that here we omit the index for generations to emphasize that only one generation is considered during the reciprocal transplant experiment. Since fitness was assumed to be the main variationcausing factor in the fitness-indicator trait among individuals in the model, we used fitness itself as the fitness-indicator trait for simplicity. In this way, any plasticity in the fitness-indicator trait is strictly neutral in our model, which is a good approximation for a trait that is strongly correlated to fitness and either evolves neutrally or has an optimum value that is uniform among the different environments.
To test for differences in the mean phenotype between the native and the new environment for each pair of sampled populations, we used the Kruskal-Wallis test (Kruskal & Wallis, 1952) followed by Bonferroni's post hoc test (Miller, 1981) for multiple comparisons.
The Kruskal-Wallis test was preferred over an ANOVA to avoid the requirement of normally distributed residuals. Empirical data were analysed similarly (see below). Effect sizes for differences in the mean phenotypes were assessed using Glass' Δ estimator (Glass et al., 1981).
To assess genetic differences between populations sampled from different demes, principal component analysis (PCA) was performed on the genetic similarity matrix, accounting for all simulated loci (underlying both the plastic and the nonplastic component of the phenotype).
The insights from the simulated reciprocal transplant experiments (described above) guided our interpretation of empirical data from salinity trial experiments of I. balthica (explained next).

| Empirical salinity trial experiment
During summer of 2018, Idotea balthica individuals from two different locations, spanning the steepest part of the salinity gradient of the Baltic Sea (including the transition zone), were used in a salinity trial experiment. The distance between the sampling locations was much larger than the typical dispersal distance (excluding potential rare rafting events), meaning that there was presumably little gene flow between the two populations (whereas this was not the case for all our simulated reciprocal transplant experiments, Figure 1, and see "Direction of significant reaction norms and their effect sizes" in Section 3). Adult isopods were collected in Vejbystrand (the high-salinity population; N 56.32°, E 12.76°; native mean salinity 16 psu) and in Svarte (the low-salinity population; N 55.43°, E 13.72°; native mean salinity 8 psu; salinity data are 0-6 m depth averages across 1995-2004 from the Rossby Centre Oceanographic circulation model [Meier et al., 2003]) and were reciprocally transplanted into their native and the alternative (higher or lower) salinity (n ≈ 75 per treatment for the high-salinity population, n ≈ 50 per treatment for the low-salinity population).
Isopods were kept in individual flow-through tanks for a period of 2 months, while fed their native diet, the brown alga Fucus vesiculosus. To avoid differences in palatability and metabolite composition of the food, which have been found to vary by geographic location (Milec et al., 2022;Nylund et al., 2012), all algae were collected near the Tjärnö Marine Laboratory (N 58.88°, E 11.14°), where the experimental work was conducted. In this way, we ensured that it was only the salinity that varied between the two environments.
At the onset of the experiment, the algae were separated into equally large pieces and placed in individual jars with or without isopods. We chose to focus on two commonly measured physiological traits for the examination of plasticity: Food consumption and respiration (as a proxy for metabolic rate). Grazing rates were monitored by weighing algae at the start and end of the experiment (carried out at 13°C). The ungrazed algae pieces were used as controls for growth/water accumulation in tissues. After the 2-month period, oxygen consumption rates were measured for each individual isopod.
Each individual was placed in a 5 ml glass vial and incubated at 6°C in the dark for 1 h. Before and after the incubation period, the oxygen concentration was measured every 15 s for a total of 3 min with a PreSens optical oxygen sensor. We present the mean values of these measurements. After the oxygen consumption measurements, the isopods were weighed and killed by decapitation and immediately placed in 95% ethanol. Genetic analyses were used to investigate population genetic differences among the two collecting locations, as well as to ensure that the reciprocal transplant was random with regard to the genetic makeup of the isopods (see below).

| Analysis of empirical data
The grazing and respiration rate data were standardized by the

| Reaction norms from simulated data
In the following, we present results obtained from simulated reciprocal transplant experiments for the sampling locations presented in Table 2.
As expected, with low plasticity and a small difference in locally optimal adaptive phenotype between the two sampling locations (A1 and A2), in a majority of realizations (96%) there was no significant phenotype difference in the adaptive trait between individuals in the native and the new environment, both for individuals that were native to deme A1 and for individuals that were native to deme A2. By contrast, in the majority of realizations (86%), there was a significant difference in the fitness-indicator trait both for individuals sampled from deme A1 and for individuals sampled from deme A2. These findings are further supported by small effect sizes (Δ) for the phenotype differences between the environments for the adaptive trait (Δ < 0.1) but large effect sizes for the fitness-indicator trait (Δ > 1). A typical simulation result for the reaction norms is shown in Figure 2 A. In this simulation, there was a clear genetic differentiation between the population that was native to deme A1 and the population native to deme A2 ( Figure S2a), mainly because the nonplastic component of genetic adaptation differed between the two populations.
The most common outcome of the reciprocal transplant experiments for sampling locations B1 and B2 was that there was no significant difference in phenotype of the adaptive trait between the native and the new environment for individuals that were native to deme B1, but a significant difference, with large effect size, between native and new environment for individuals that were native to deme B2 (Figure 2b). This outcome occurred in 55% of the realizations, and in the other 45% of the realizations, none of the two populations had a significant difference in the adaptive trait.
For the fitness-indicator trait, there was in all 1000 realizations a significant difference with large effect size between the phenotype in the new and the native environment, both for individuals that were native to deme B1 and for individuals that were native to deme B2. However, the reduction in fitness was less severe for individuals that were native to deme B2 than to B1. As for sampling locations A1 and A2, there was a clear genetic differentiation between the two populations ( Figure S2b). inside the boxes denote the median of the data and the boxes span between the upper and lower quartiles (75th and 25th percentiles). The whiskers indicate the maximum and minimum of the nonoutlier data. Outliers (denoted by rings) were computed using the interquartile range (i.e., points above the upper quartile +1.5 times the distance between upper and lower quartile or below the lower quartile −1.5 times the distance between the upper and lower quartile were considered outliers). Trait values in the native environments are denoted by red box plots and trait values in the new environments are denoted by blue box plots. All phenotypic values for the adaptive trait are translated so that the minimum value measured in each experiment is zero. populations native to the two different environments were genetically differentiated (Figure 2c, Figure S2c). However, the reduction in fitness was smaller than for demes B1 and B2, despite a larger difference in locally optimal adaptive phenotype.
When there was almost no difference in the nonplastic phenotype component of the adaptive trait, but the plasticity was high and differed between the two populations (sampling locations D1 and D2), we observed a significant difference with large effect size in the adaptive trait between the native and new environment for both populations in all 1000 realizations. For the fitness-indicator trait, we observed, in 40% of the realizations, a significant reduction in fitness with typically large effect size in the new environment for individuals that were native to deme D1 but not for individuals that were native to deme D2 (Figure 2d). In 32% of realizations, we found that none of the populations had significant differences in phenotypes for the fitness-indicator trait. The remaining two possibilities, namely that both differences were significant, or that only the population native to D2 but not the one native to D1 had significantly nonzero reaction norms, each occurred in 14% of the realizations. In the simulation result shown in Figure 2d, (a single randomly chosen realization) there was a less clear separation between the two populations by PCA than for populations sampled at locations A1 and A2, B1 and B2, or C1 and C2, despite the relatively large difference in the locally optimal adaptive phenotype ( Figure S2d; compare to panels a-c). Note that we did not have neutral loci in the model, which could contribute to differentiation between natural populations.
When plasticity was very high (>0.9) in both environments  Table 3.
For reciprocal transplant experiments across larger distances in our model (recall that the environmental gradient was steepening in space), the reaction norms for the fitness-indicator trait were always significant with large effect sizes (typically with p ≪ 10 −10 and | Δ | ≫ 1; Figure S3). For the adaptive trait, the reaction norms were also typically significant, except for when plasticity in the native environment was very low ( Figure S3a). Note that fitness, in all cases, was extremely low in the new environment, which implies that individuals would not be able to reproduce there, although they may survive in short-term experiments (which could lead to an overestimation of the actual fitness in the experiments). In all cases, there was a clear genetic differentiation between the two populations ( Figure S4). By contrast, the fitness-indicator trait value was higher in the native environment (regardless of whether the native environment is deme Y1 or Y2) than in the new environment when Y = a, b or c. We analyse this further next.

| Direction of significant reaction norms and their effect sizes
To further understand how the difference in local phenotypic optima and plasticity influence the reaction norms, including their significance, direction, and effect sizes, we simulated additional reciprocal transplant experiments with one (and the same) sampling location being involved in each experiment as the "focal" deme, whereas the second "alternative" deme varied among the experiments. The focal deme was either deme 0 (with an average plasticity of 0.02), or deme 20 (with an average plasticity of 0.06). The alternative deme was every second deme between deme 2 and deme 110, or between deme 22 to deme 110, respectively (see Figure 1 for deme numbers and an illustration of the habitat). Note that the population native to any alternative deme had higher plasticity on average than the population native to either focal deme (blue line in Figure 1).

Significant phenotype difference in adaptive trait
Reaction norms for the adaptive trait were typically in opposite directions (that is, the phenotype difference between foreign and native environments was of opposite sign), when individuals were transplanted from their respective native environments to the new environment, as expected ( Figure 3a). The effect sizes of the phenotypic differences were large when the differences in locally optimal phenotype (Δ ) between the focal and alternative environments were large enough ( Figure 3a). When Δ was kept constant and both populations were moved along the environmental gradient such that the plasticity in the native environment was varied, reaction norms for the adaptive trait were flat when plasticity was very low ( Figure S5a) whereas reaction norms for the fitness-indicator trait were flat when plasticity was high ( Figure S5b). The effect sizes for differences in the adaptive trait were large when plasticity was high ( Figure S5c). By contrast, the effect sizes for the differences in the fitness-indicator trait were very large when plasticity was low, but small when plasticity was high ( Figure S5d).
We additionally examined the norms of reaction in a model with a phenotypic optimum that changed linearly in space (rather than according to a steepening gradient). In the experiments with a fixed focal environment and a varying alternative environment in this model, the results were qualitatively similar to those in Figure 3.
However, for the linearly changing optimum, the spatial variability of plasticity was smaller than in the model with a steepening environmental gradient ( Figure S1). Consequently, the transplant results

F I G U R E 3 Plots of effect sizes with direction (a, b, e, and f) and
p-values for nonzero slopes of reaction norms (c, d, g, and h) as a function of the difference in locally optimal adaptive phenotype (Δ ) between the focal and the alternative environment for the adaptive trait (left column), or for the fitness-indicator trait (right column). Results are shown for focal deme numbered 0 (a-d) or 20 (e-h). The blue rings stand for reaction norms for individuals that are native to the focal deme (with low plasticity) whereas red crosses stand for reaction norms for individuals that are native to the alternative deme (where plasticity is higher than in the focal deme). The dashed lines indicate the line beyond which absolute values of effect sizes can be considered large (c, d, g, h) according to Cohen's rules-of-thumb (Sawilowsky, 2009), or where p = 0.05 for the Kruskal-Wallis test (c, d, g, h). Note a logarithmic scale on the y-axis in panels a, b, e, and f, and on the x-axis in all panels.
between the focal and alternative environments were more rarely asymmetric (i.e., with a significant nonzero slope of the norm of reaction for only one population) than in the model with a steepening gradient ( Figure S6), although some asymmetry existed when the environmental gradient was shallower, owing to a stronger gradient in realized plasticity.

| Empirical data
For the population collected in the high-salinity environment (native salinity 16 psu), neither grazing nor respiration exhibited a statistically significant difference between the two experimental salinity treatments (Figure 4a,b). For the population collected in the low-salinity environment (native salinity 8 psu), however, grazing rates were significantly lower (p = 0.008; effect size Δ = − 0.538) and respiration rates were higher (p = 0.003; Δ = 0.521) in the high-salinity environment than in the native salinity (Figure 4c,d). Two-way ANOVAs did not show significant differences in either grazing or respiration rate between the two populations (Tables S1). However, there was a significant difference between the environments for grazing (Tables S1), but not for respiration rate (Tables S2). Furthermore, there were significant GxE effects in respiration rate and grazing. Bonferroni's post hoc test revealed that the significant differences were between the native and the new salinity for individuals from the low-salinity population for both grazing and respiration rate. For respiration rate, there was also a significant difference between low-salinity and high-salinity individuals in a salinity of 8 psu (Tables S2).
The genetic data showed that the two populations are genetically differentiated ( Figure 5).

| Reciprocal transplant experiments on simulated data
No significant differences between the trait values for the adaptive trait in the native and the new environment usually occurred for populations that had both negligible plasticity in the adaptive trait and a small difference between the phenotypic optima (although significant differences could sometimes occur by chance).
By contrast, we found statistically significant nonzero slopes in In contrast to the adaptive trait, the slopes of reaction norms were significantly nonzero for the fitness-indicator trait when locally adaptive plasticity in the adaptive trait was too low to keep the difference in fitness between the new environment and the native environment statistically insignificant. That is, the slopes of reaction norms were significantly nonzero when phenotypic buffering of the fitness-indicator trait was sub-optimal. By contrast, when locally adaptive plasticity in the adaptive trait was high enough to keep fitness unchanged in the new environment relative to the native environment, that is, when phenotypic buffering was optimal, there were no significant differences in the fitness-indicator trait between the two environments.
Note that, unless one of the populations is better adapted to the other population's environment than to its own native environment (e.g., if one population has very recently colonized its current habitat) the fitness in the non-native environment is expected to be either lower or equal to the fitness in the native environment. Thus, significant nonzero slopes of reaction norms for fitness-indicator traits have the same direction (i.e., the same sign of the phenotype difference between the new and the native environment) for both populations, unless one of the populations is better adapted to the native environment of the other population than to its own native environment. This may allow us to distinguish between fitnessindicator traits with suboptimal phenotypic buffering and adaptive traits with plasticity. Namely, when for both populations the reaction norms for a fitness-indicator trait are significantly nonzero and have the same direction (indicating a decrease in fitness in the nonnative environment), potential plasticity in the adaptive trait is not sufficiently adaptive to maintain the locally optimal adaptive phenotype in the new environment.
When the reciprocal transplant experiments were carried out across a long enough distance (implying a large difference in phenotypic optimum in our model), we always observed a significantly lower fitness in the new environment compared to the native environment (because the phenotype was never determined solely by plasticity in our model, some nonplastic genetic adaptation was always involved). Thus, whether phenotypic buffering is optimal depends on the difference between the two environments: A population may experience optimal phenotypic buffering when the difference in optimal phenotype between the native and the new environment is small enough, but sub-optimal phenotypic buffering otherwise. Indeed, when local plasticity is higher in one of the environments than in the other, phenotypic buffering may be optimal for one population but sub-optimal for the other.
In our model, plasticity increases from the centre toward the edges of the habitat. Notably, however, environmental conditions in natural populations consist of both abiotic and biotic factors, for example, species competition (Case & Taper, 2000;Turcotte & Levine, 2016), and the combined effect of these could lead to harsher conditions and/or steeper gradients in the central parts of the habitat than in the habitat edges (e.g., if competition is stronger in central parts of the habitat), which could potentially promote the evolution of higher plasticity in the centre than in the edges.
Elucidating the explicit role of biotic versus abiotic factors and their interplay for the evolution of plasticity and nonplastic genetic adaptation is an interesting avenue for future work, both theoretically and empirically.
Recall that we here assumed that the fitness-indicator trait was equal to the phenotype-dependent component of fitness (which depends also on the cost of plasticity). However, assuming orthogonality between traits and free recombination between the loci underlying the traits, we expect that our conclusions hold also for any trait f that is a linear function of the phenotype-dependent F I G U R E 5 Principal component analysis of identity-by-state (IBS) distances, using 28,507 2b-RAD derived genetic loci. Low-salinity individuals are coloured in red, high salinity in blue. Circles represent the 95% confidence intervals of the two experimental groups. Collecting location is a strong determinant of the genetic variation (PERMANOVA p < 0.001).
component of fitness w, that is, f = a + bw. Here, a and b are environment-independent trait components (or trait components having a spatially uniform optimal phenotype) with a polygenic basis, allowed to evolve over time, and w is the phenotype-dependent component of fitness, Equation (3). Note that the coefficient b can take any real value, but for b < 0, the trait f would be negatively correlated to w, whereas for b = 0 the trait would be uncorrelated to w. This is because the population average phenotype of a neutral trait orthogonal to an adaptive trait eventually evolves to a spatially constant value (Tomasini et al., 2022).

| Interpretation of empirical data
We observed a statistically significant increase in respiration, measured as a proxy for metabolic rate, and a decrease in grazing activity in Baltic Sea I. balthica when transplanted from a native salinity of 8 psu to a salinity of 16 psu. By contrast, no significant differences in grazing activity or metabolic rate were apparent when I. balthica native to a salinity of 16 psu were transplanted to a salinity of 8 psu. Because flat reaction norms may suggest either very high locally adaptive plasticity (i.e., phenotypic buffering) or very low (or absent) locally adaptive plasticity, the possible conclusions with respect to the combinations of the types of traits measured (fitness indicator or adaptive) are the following: If both traits are adaptive, the results would suggest that the high-salinity population (native salinity 16 psu) does not have any significant plasticity with respect to salinity, whereas the lowsalinity population (native salinity 8 psu) expresses plasticity in salinity tolerance. In this case, the plastic responses are expressed as a change in respiration rate and grazing activity. By contrast, if at least one of the traits is a fitness-indicator trait, the high-salinity population would have locally adaptive plasticity in some unmeasured adaptive trait, which keeps fitness unchanged when transplanted to the low-salinity environment (thus, the fitness-indicator trait is phenotypically buffered sensu Reusch, 2014). In this case, plasticity in the adaptive trait with respect to salinity is lower for the low-salinity population than for the high-salinity population (or the low-salinity population lacks plasticity in the adaptive trait with respect to salinity), and the low-salinity population, therefore, has a reduced fitness in the high-salinity environment. Note that it is sufficient that one of the two measured traits is a fitness indicator to conclude that the low-salinity population of I. balthica has significantly lower fitness when transplanted to the high-salinity environment (and the conclusion remains the same regardless of which trait is a fitness indicator).
To determine which of the two possibilities is more likely (i.e., if both traits are adaptive, or at least one trait is a fitness-indicator trait)  Postel et al., 2000). However, the closely related species Idotea chelipes in the Baltic Sea (native salinity 7 psu) has been shown to osmoconform, and thus reduce metabolic rates (presumably due to less required ion pumping) at salinities above 11 psu (Łapucki & Normant, 2008). Similarly, Normant and Lamprecht (2006) found that Baltic Sea individuals of the amphipod Gammarus oceanicus reduce their metabolic rates as well as feeding rates in higher salinities. These findings contrast with our result. The observed increase in metabolic rate for I. balthica should be accompanied by an increase in food intake as more energy is required, if the increased metabolic rate is locally adaptive. However, instead we observed a reduction in the amount of grazing. We further note that the grazing activity is the same in the native environments for both populations (Tables S3), which further suggests that this may be a fitness-indicator trait that is phenotypically buffered with respect to salinity for the high-salinity population, but sub-optimally phenotypically buffered with respect to salinity for the low-salinity population. Combined, these observations suggest that the phenotypic changes observed when transplanting low-salinity population isopods to 16 psu seawater are nonadaptive with respect to salinity changes and that suboptimal phenotypic buffering occurs for the low-salinity population, at least for the grazing rate. By contrast, optimal phenotypic buffering with respect to reduced salinity seems to occur for the high-salinity population, suggesting that this population has higher locally adaptive plasticity with respect to reduced salinity than the low-salinity population has with respect to increased salinity. Importantly, however, it remains to be understood whether optimal phenotypic buffering for the high-salinity population would be retained if we included in the experiment also other environmental factors that differ between the sampling locations (and that may have been involved in adaptation).
This is an important open question for future studies.
Interestingly, our data show that the respiration rate is the same in all environments except for the low-salinity population in its native salinity. By interpreting the respiration rate in the native environment as the locally optimal respiration rate, this finding would imply that respiration rate is an adaptive trait and that the lowsalinity population, but not the high-salinity population, has locally adaptive plasticity in respiration rate. However, as the expression of many traits influences the metabolic rate and therefore also the respiration rate, the respiration rate is clearly correlated to multiple other traits, that may or may not be involved in adaptation to salinity. Genetic differentiation in these traits may cause the optimal respiration rate to differ between individuals from the low-salinity population and individuals from the high-salinity population, even when they are exposed to the same salinity level. Additionally, even if the plastic response in respiration rate is indeed adaptive, it could be counteracted by locally maladaptive plasticity in other traits that are also affected by salinity.
Note that the low-salinity population must still have a relatively high locally adaptive plasticity in the adaptive traits if metabolic rate and/or grazing are fitness-indicator traits, because metabolic rate and grazing are constant with respect to the salinity levels within the Baltic Sea (Wood et al., 2014). Thus, it seems likely that the lowsalinity population either has a reduced salinity-related locally adaptive plasticity relative to the high-salinity population, but still high enough locally adaptive plasticity to maintain homeostasis within the entire Baltic Sea, and/or that it has an effective response across a different salinity range (e.g., between 3-15 psu, whereas the highsalinity population may tolerate a salinity range between 10-35 psu).
Our empirical results also suggest that there is some genetic differentiation between the high-and low-salinity populations. A recent study (De Wit et al., 2020)  general, yet simple grounds to provide insights into potential caveats associated with the interpretation of reaction norms in dependence on whether the trait being measured is an adaptive trait or as a fitness-indicator trait, while assessing the signatures of optimal and sub-optimal phenotypic buffering. In other words, our model was not specifically tailored to account for the detailed biology of I. balthica, nor for the precise heterogeneity of key environmental conditions in the Baltic Sea and the transition zone. Although salinity is the main abiotic factor that varies geographically in the Baltic Sea (Björck, 1995;Johannesson et al., 2020), additional environmental variables may play a role in adaptation of I. balthica during its range expansion history. Among these factors are the palatability of F. vesiculosus, the temperature, the exposure to waves, and the pH, although wave exposure and variability in pH are probably largest on the microhabitat level (Wahl et al., 2018). Adaptation to multiple environmental factors may generate signals of genetic differentiation beyond those corresponding to adaptation to the observed salinity gradient alone.
We thus conclude that the low-salinity population of I. balthica is probably more sensitive to increased salinity than the high-salinity population is to decreased salinity, possibly because plasticity has been lost through serial founder effects during colonization of the Baltic Sea. Another possibility is that plasticity with respect to salinity has been reduced in the Baltic Sea (i.e., in our low-salinity population) because the salinity gradient is shallower there than in the Öresund  and/or because the low-salinity population experiences weaker fluctuations in salinity than the high-salinity Recall that we assumed in our simulated experiments that the populations had stabilized, which may not be the case for populations in the Baltic Sea. If the low-salinity population has recently colonized its habitat, we expect that its fitness would likely increase when transplanted to the high-salinity environment, but this is not consistent with the combination of increased metabolic rate and decreased food consumption. However, the opposite is also possible (albeit less likely) due to repeated founder events and potential spread of genotypes that are less-than-average adapted to their source popula-

| Caveats and recommendations for future work
In the following, we summarize four key messages from our study.

| The interpretation of reaction norms differs depending on whether a trait is adaptive or a fitness indicator
Naïvely interpreting reaction norms as if the assessed trait is adaptive may lead to the wrong conclusions when the assessed trait is, instead, an indicator of fitness. The same is true when the assessed trait is adaptive but wrongly assumed to be a fitness indicator.
Therefore, we suggest that researchers should consider in advance whether the traits assessed in an experiment are adaptive or fitness indicators in the species in question and the given environmental context. However, if it is not known a priori whether a trait assessed is adaptive or a fitness indicator, both possibilities should be considered along the lines illustrated in this study.
We note that it is possible that traits lie on a continuous spectrum between purely adaptive and purely fitness-indicator traits due to, for example, coupling (realized through, e.g., linkage disequilibrium or pleiotropy) between adaptive and fitness-indicator traits.
This could be species and/or environmental specific.
We, therefore, encourage researchers to report whether the assessed trait is a fitness indicator or an adaptive trait because this will help clarify on which grounds their interpretation of the empirical results are based on. With this, it will also be possible to perform meta-analyses that will allow us to understand the type of different traits and to which extent the type of a given trait is species and/or environment specific.

| Phenotypic buffering may be sub-optimal
Recall that phenotypic buffering (sensu Reusch, 2014) implies a flat reaction norm for a fitness-indicator trait. Here, however, we call this optimal phenotypic buffering, which should be distinguished from sub-optimal phenotypic buffering wherein reaction norms in a fitness-indicator trait are not flat. Whether phenotypic buffering is optimal or sub-optimal is likely species-and environment-dependent and is likely to vary throughout a species' range.
Interestingly, sub-optimal phenotypic buffering can help us to distinguish between fitness-indicator and adaptive traits in reciprocal transplant experiments. Namely, if for both populations the reaction norms in a given trait have the same direction in comparison to each other, then the trait is most likely a fitness indicator.
Otherwise, if the reaction norms have the opposite direction in comparison to each other, the trait is most likely adaptive. By contrast, when optimal phenotypic buffering occurs (e.g., when the difference in optimal phenotype between native and new environment is very small) the reaction norms may be flat for both kinds of traits. In these cases, it may not be possible to tell whether the assessed trait is better described as an adaptive trait or as a fitness indicator. This may be overcome by performing additional reciprocal transplants, including locations with more distinct environmental conditions. Alternatively, if the assessed trait can reasonably be assumed to be orthogonal to other key traits of the species, one may consider the phenotypic values of the trait when optimal phenotypic buffering occurs: if the trait is adaptive, any population produces the same phenotype on average when exposed to the same environment, and if the trait is a fitness indicator, then any population produces the same phenotype on average in all envi-

| Modelling and computer simulations can aid the interpretation of reaction norms
Simulations that are specifically tailored to fit the population and the environment of interest can bring further insights into adaptive responses of populations and the interpretation of reaction norms.
Simulations may, for example, indicate fitness trade-offs, the extent of local plasticity and nonplastic genetic adaptation, genetic differentiation, and expectations for reaction norms with respect to the different types of traits assessed in experiments.
As a final note, we believe that more collaborative work between empiricists and theorists is necessary to gain further understanding of the role of nonplastic genetic adaptation and plasticity in the evolution of natural populations.

ACK N OWLED G M ENTS
We gratefully thank Kerstin Johannesson and Roger K. Butlin for insightful comments on the manuscript. We are also thankful to 2018-05973.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All raw sequencing data produced in this study are available at NCBI