A Bayesian approach to inferring dispersal kernels with incomplete mark-recapture data

Dispersal is a fundamental ecological process that links populations, communities and food webs in space. However, dispersal is tremendously difficult to study in the wild because we must track individuals dispersing in a landscape. One conventional method to measure animal dispersal is a mark-recapture technique. Despite its usefulness, this approach has been recurrently criticized because it is virtually impossible to survey all possible ranges of dispersal in nature. Here, I propose a novel Bayesian model to better estimate dispersal parameters from incomplete mark-recapture data. The dispersal-observation coupled model, DOCM, can extract information from both recaptured and unrecaptured individuals, providing less biased estimates of dispersal parameters. Simulations demonstrated the usefulness of DOCM under various sampling designs. I also suggest extensions of the DOCM to accommodate more realistic scenarios. Application of the DOCM may, therefore, provide valuable insights into how individuals disperse in the wild.

behind the study area, causing serious underestimation of dispersal parameters. Second, even when 45 marked individuals remained in the study area, imperfect detection of marked individuals may pose a 46 challenge to infer dispersal processes (Pepino et al. 2012;Rodriguez 2002). To date, several statistical 47 models have been proposed to overcome these difficulties (Fujiwara et al. 2006;Pepino et al. 2012;48 Rodriguez 2002). For example, Rodriguez (2002) developed a general class of dispersal models that 49 describe how marked individuals are recaptured through dispersal and sampling processes. However, 50 these models are implicit about unrecaptured individuals and/or have limited extendibility to more 51 complex models that capture plastic and context-dependent dispersal. Hence, there is a need to develop a 52 new class of statistical models that have a greater extension capacity. 53 Bayesian inference provides a flexible statistical framework that may open the opportunity to 54 overcome challenges in utilizing mark-recapture data (Kéry & Schaub 2012;Terui et al. 2017). Here, I 55 introduce a novel Bayesian model that integrates dispersal and observation processes into a single coupled 56 model. The dispersal-observation coupled model, DOCM, can extract information from both recaptured 57 and unrecaptured individuals. Consequently, the model can provide less biased estimates of ecological 58 parameters. In this study, I demonstrate that the usefulness of DOCM using simulated test datasets 59 produced under various sampling designs and discuss its extension capacity to more realistic models. 60 61 Model 62 I consider a situation in which a virtual ecologist conducts a mark-recapture study in a one-dimensional 63 space (e.g., a stream). They choose a section with length for the mark-recapture study (i.e., the 64 observation section) and divide it into subsections with length l. The number of subsections is thus L l -1 . 65 In each subsection, virtual ecologists perform an initial capture survey and assign a subsection ID to each 66 individual to locate them. After marking individuals uniquely, captured individuals are released into the 67 center of the subsection where they were caught. Then, released individuals disperse freely for a certain 68 period and a recapture survey occurs in the observation section. Since the observation section is a finite 69 domain, individuals can leave this area. Also, only survived individuals may be recaptured with some 70 probability even when marked individuals stay in the observation section. Thus, to be recaptured, 71 individuals must (1) stay in the observation section, (2) survive until being recaptured, and (3) be detected 72 if they survive and remain in the observation section. To represent this data-producing process, I propose 73 the following modeling framework that integrates dispersal and observation processes ( Figure 1). 74 Dispersal model. I first model the dispersal process. Let and be locations at initial capture 75 and recapture sessions, respectively, for individual . The variables and may be expressed as the 76 distance from the center of the capture/recaptured subsection to either end of the observation section (e.g., 77 the downstream end of the study section in streams). I assume the location variable at recapture to 78 follow a Laplace distribution, a dispersal kernel commonly used in the dispersal literature (Nathan et al. (1) 82 83 The parameter is the average dispersal distance. Equation 1 illustrates that the recapture location is 84 conditional on the release location and the dispersal parameter ( Figure 1).

85
Observation model. After dispersal, marked individuals are subject to an imperfect observation 86 process. Let be the variable representing a recapture history for individual ( = 1 if recaptured; 87 otherwise 0). The response variable can be modeled as a realization of a Bernoulli distribution: 88 where is the probability that individual moves to the subsection of recapture (recaptured individuals) 92 or stays in the observation section (unrecaptured individuals), is the survival probability between the 93 time points of release and recapture, and is the detection probability during a recapture survey. The 94 parameters and can be isolated if an independent dataset to estimate detection probability, e.g., 95 multiple-pass removal data, is available (Dorazio et al. 2005). Otherwise, the two parameters need to be 96 condensed into recapture probability (= ) such that: Here, I couple the observation and dispersal models by describing as a function of the 101 dispersal parameter and release location . Specifically, is denoted as: 102 103 Recaptured individuals are known to be present at the subsection of recapture, so the range of integration 106 is given as − 2 to + 2 in equation 4 (i.e., from one end to another end of the subsection). This 107 expression gives the probability of movement from the release location to the subsection of recapture 108 given the dispersal parameter . For unrecaptured individuals, equation 4 accounts for two important 109 facts: (1) a greater value of the dispersal parameter decreases the probability of remaining in the 110 observation section ( ) and (2) the release location influences (i.e., individuals released near the 111 edge of the observation section are more likely to emigrate; Figure 1b). Key parameters in the DOCM 112 were summarized in Table 1.

114
Evaluation of model performance. To evaluate the performance of the DOCM, I generated test datasets 115 under different sampling designs. Specifically, I focused on the following design factors that are related to 116 sampling efforts in the field: (1) the number of individuals marked and released (100, 500, and 1000 117 individuals), (2) the length of the observation section (500 and 1000 m) and (3)  Under each sampling design, I produced 100 test datasets with different values of , which was 122 drawn from a uniform distribution (range: 10 -300 m). Each independent dataset was generated as 123 follows. First, individuals were assigned randomly to subsections (Figure 3a). These "marked" 124 individuals were released at the center of the captured subsection, which was recorded as release location 125 . Second, released individuals relocate themselves along a one-dimensional space according to a known 126 dispersal kernel as , | , ~ ( , ) (Figure 3b). Individuals were considered to remain in 127 the observation section if true recapture location , was within a range of 0m. Then, remained 128 individuals were recaptured with recapture probability (Figure 3c). When recaptured, the true recapture 129 location , was rounded to a location value at the center of the recapture subsection to mimic real 130 field data ( Figure 3c). For unrecaptured individuals, was recorded as "NA". 131 I estimated average dispersal distance and recapture probability using the DOCM and a 132 simple dispersal model. The simple dispersal model is a "control" that does not model the observation 133 process and the average dispersal distance was estimated as | , ~ for the DOCM was provided in Box 1. R and JAGS scripts used in simulations will be made available at 146 Github upon publication.

148
Results and discussion 149 Model performance. The DOCM performed well under various sampling designs. Figure 4 shows the 150 relationship between the true and estimated values of δ (denoted as and , respectively) when the 151 recapture probability was 0.50. The parameter estimates from the DOCM were always closer to the true 152 values (compare red and black lines in Figure 4) compared with those derived from the simple dispersal 153 model without the observation process. The degree of improvement was significant. While 95% CIs of 154 the simple dispersal model tended not to include , the DOCM was more likely to encompass the true 155 values especially when the observation length was long enough ( = 1000 m). Similarly, the DOCM 156 provided less biased estimates of recapture probability , a composite of survival and detection 157 probabilities ( Figure 5). The estimated was higher than the proportion of individuals recaptured in the 158 test dataset ( , where is the number of recaptured individuals) because it was corrected for permanent 159 emigration. However, as the increases, the DOCM became underestimating the parameters, though 160 the degree of bias is better than the simple dispersal model. This pattern was apparent when the 161 observation section was short relative to and is caused by the substantial number of individuals 162 leaving behind the observation section. For each of the parameters, these results were qualitatively similar 163 irrespective of , although higher values of led to the narrower range of 95% CIs for as 164 more individuals were recaptured. Detailed results with different values of were provided in 165 Figures S1 -S4. 166 The narrower clearly when increased from 100 to 500 individuals. A further increase in N, however, did not 170 improve the precision of the parameter estimates. Increasing N did not contribute to improving the 171 accuracy of the parameter estimates (i.e., the closeness to and ). In contrast, the length of the 172 observation section L was more influential on the accuracy of while having little influence on the 173 precision of the parameter estimates (Figures 4 and 5). Increased improved the accuracy of because 174 long-distance dispersers were more likely to be recaptured. Neither accuracy nor precision was improved 175 when the spatial resolution of sampling (smaller l) increased.

177
Usefulness and limitations. The DOCM worked well under various sampling designs, proving its 178 usefulness to infer dispersal processes in the wild. The DOCM can extract information from both 179 recaptured and unrecaptured individuals, thereby improving the accuracy of parameter estimates. The 180 DOCM, therefore, represents a promising tool to study dispersal processes. To apply the DOCM, users 181 must obtain the following data: (1) individuals must be marked uniquely or by release subsection; (2) 182 release location ( ); (3) recapture location ( ); (3) spatial resolution of subsection length (l); (4) 183 observation section length (L). These are a common dataset obtained through a mark-recapture study, so 184 no additional work may be required to use the DOCM. Furthermore, if users have an independent 185 estimate of detection probability through multiple-pass removal (Dorazio et al. 2005)  This model connects the trait variable with the dispersal parameter by estimating and . In this 220 model, individuals follow different dispersal kernels according to their ecological trait(s), such as body 221 size. If the variable is a random variable that follows a normal distribution with a mean and standard 222 deviation , ( , , ), then the composite dispersal kernel ℎ( , , , ) is: If is a binary variable drawn from a Bernoulli distribution with a success probability , the composite 227 dispersal kernel is: Second, the observation model can also be extended to account for individual-level variability in 243 recapture probability . Survival and detection probabilities may vary among individuals and ignoring 244 this complexity could cause biased estimates of dispersal parameters. The simplest way to account for the 245 variability is to model as a random variable drawn from a Beta distribution: This allows the model to account for individual-level variation in recapture probability . If there are 250 hypothesized predictors that could influence the recapture probability (e.g., habitat structure), such effects 251 can be modeled as: where is the expected recapture probability, the dispersion parameter, the intercept and the 259 regression coefficient. Therefore, the DOCM can deal with the complexity of field data.

261
Conclusion. Dispersal is a fundamental process that drives the ecology and evolution of various 262 organisms (