## ABSTRACT

The Canadian Wildlife Service (CWS) requires reliable estimates of the harvest of migratory game birds, including waterfowl, to effectively manage populations of these hunted species. The National Harvest Survey is an annual survey of hunters who purchase Canada’s mandatory migratory game bird hunting permit, integrating information from a survey of hunting activity with information from a separate survey of species composition in the harvest. We use these survey data to estimate the number of birds harvested for each species, as well as hunting activity metrics such as the number of active hunters and days spent hunting. The analytical methods used to generate these estimates have not changed since the survey was first designed in the early 1970s. Here we describe a new hierarchical Bayesian integrated model, which replaces the series of ratio estimators that comprised the old model. We are now using this new model to generate estimates for migratory bird harvests as of the 2019-2020 hunting season, and to generate updated estimates for all earlier years. The hierarchical Bayesian model uses over-dispersed Poisson distributions to model mean hunter activity and harvest (zero inflated Poisson and zero truncated Poisson, respectively). It also includes multinomial distributions to model some key components including, variation in total harvest across periods of the hunting season, the species composition of the harvest within each of those periods, and the age and sex composition in the harvests of a given species. We estimated the parameters of the Poisson and the multinomial distributions for each year as random effects using first-difference time-series. This time-series component allows the model to share information across years and reduces the sensitivity of the estimates to annual sampling noise. The new model estimates are generally very similar to those from the old model, particularly for the species that occur most commonly in the harvest, and so the results do not suggest any major changes to harvest management decisions and regulations. However, estimates for all species from the new model are more precise and less susceptible to annual sampling error, particularly for species that occur less commonly in the harvest (e.g., sea ducks and other species of conservation concern). This new model, with its hierarchical Bayesian framework, will also facilitate future improvements and elaborations, allowing the incorporation of prior information from the rich literature and knowledge in game bird management and biology.

Reliable estimates of the harvest of migratory bird populations are necessary for Canada to manage populations of migratory game birds and to meet its commitments under the Migratory Bird Convention Act (Migratory Birds Convention Act, 1994, S.C. 1994, c. 22). Declining waterfowl populations in the early 1900’s was the first indication that there could be negative effects on the sustainability of some migratory bird populations if there was no protection against excessive hunting (Nichols et al. 1995, Cooch et al. 2014). Under the Migratory Bird Convention Act, the Canadian Wildlife Service (CWS) has implemented regulations in order to prevent collapses of such populations while simultaneously allowing for recreational hunting. Management actions include designated hunting seasons, daily bag limits and possession limits.

Harvest data and estimates of harvest are increasingly being used as one of the main data sources for estimating various parameters of population dynamics, such as determining population size (Alisauskas et al. 2009, 2014, Zimmerman et al. 2017), fecundity (Zimmerman et al. 2010, Osnas et al, 2016), sex ratio (Hagen et al. 2018), as well as in integrated population modeling efforts (Saunders et al. 2019). Reliable estimates of harvest are particularly important for harvested species with no reliable, population-wide count-based surveys, such as many Arctic-breeding geese.

The CWS estimates the annual recreational harvest of migratory game birds in Canada using the National Harvest Survey (NHS). The program uses information gathered from purchasers of the Migratory Game Bird Hunting Permits (MGBHP). These permits are mandatory for all non-indigenous individuals to hunt migratory game birds in Canada, and so they provide a sampling universe from which the survey can draw (Sen 1976, Cooch et al. 1978). The NHS was initiated in 1967, and after several years of fine-tuning, has been conducted annually using the same analytical methods since 1976 (Sen et al. 1975). The National Harvest Survey website (https://wildlife-species.canada.ca/harvest-survey) is the central platform that annually distributes a variety of estimates related to harvests and hunting activity. For example, published estimates include: the number of active waterfowl hunters, the number of waterfowl hunting days, the number of successful hunters, species-specific harvest, and age-ratios. Providing reliable estimates is a critical part of the National Harvest Survey program since they, along with other CWS monitoring programs, are used to assess the status of migratory game bird populations in Canada (Canadian Wildlife Service Waterfowl Committee. 2020) and levels of sustainable harvest (Gilliland et al. 2009, Palumbo et al. 2020).

The analytical methods used to generate estimates of migratory game bird harvest could benefit from contemporary model-based approaches such as hierarchical Bayesian models (Dorazio et al. 2016). These model-based hierarchical approaches provide a coherent framework for sharing information through time, and among geographic strata and/or hunter classes (Cressie et al. 2009). Bayesian approaches provide both improved estimates of uncertainty and a transparent and explicit way to incorporate the ecological and sociological knowledge (Gelman et al. 2013, van de Schoot et al. 2021) that comes from the rich history of game bird population biology and harvest management in North American (Nichols et al. 1995, NAWMP 2018). Since the beginning of the survey, the estimates of harvests and hunting activity have been made using design-based equations for ratio estimates (Cochran 1977), which we refer to here as the “old model”. The relevant ratios included the ratios of harvest per hunter, hunting-days per hunter, and the number parts of a given species per total number of parts (Sen et al. 1975). This design-based old model calculated these mean ratios for each stratum of a stratified random sample in a given year (Cooch et al. 1978). However, each year’s estimates were derived independently of all other years, and were therefore particularly sensitive to variation among years in the sample, response rates to the survey, and the total number of hunters.

In this study, we describe a new hierarchical Bayesian integrated model (hereafter “the model” or “the new model”) that estimates annual harvest of all species of migratory game birds hunted in Canada, as well as summary estimates of hunter activity and total number of ducks, geese, and other major species groups. With this new model, we expect to generate estimates that are more precise and less sensitive to sampling noise, particularly for species of conservation concern or those less abundant in the harvest. We provide the full code required to run it in an online supplement, which greatly increases the transparency of these estimates over the old model, which was never published in a formal and comprehensive way. In addition, the Bayesian framework of this new model will allow for future improvements using informative priors that incorporate the ecological and sociological knowledge that underlies the long history of waterfowl harvest management in North America (NAWMP 2018).

## METHODS

### Overview of the NHS

The NHS sampling methods have remained the same since 1976. It is separated into two primary components: the harvest questionnaire survey (HQS), and the species composition survey (SCS). Hunters responding to the HQS provide information on the total harvest of broad groups of species (e.g., all ducks, geese, and other non-waterfowl species), where they hunted, and the total number of days spent hunting, as well as calendar information indicating how many birds in each group they harvested on each day of the season. We stratified the HQS responses based on hunter residency (Canadian vs US residents) and their previous hunting activity. Hunters responding to the SCS provide information on the species composition of their harvest by submitting wings (ducks and murres) or tail-fans (geese) that are identified to species, and if possible, aged and sexed, by waterfowl biologists. We then integrated the overall hunting information of the HQS with the species composition information of the SCS to generate estimates of the species-level harvest in each year.

We randomly selected potential participants for the HQS and the SCS using records in the MGBHP database. We separated permit holders into 24 geographic hunting zones (Fig. 1) and for the HQS, we further separated them into four classes (Table 1). We selected hunters in categories A and E from the current year’s permit records, while those in categories B and D are selected from the previous year’s permit records. This classification is based on their country of residence and whether or not they held a permit in previous years. The precision of the estimates increase by grouping the hunters into one of the four classes because it takes into account differences in hunting activity and success.

We selected SCS participants differently than those in the HQS. In order to distribute plastic envelopes for the wing and/or tail samples before the start of the hunting season, we selected participants from the previous year’s MGBHP database. First, we determined a hunter’s willingness to participate in the survey by mailing a participation screening card in early July. We generated this random sample of permit holders based on survey participation history, hunting success, and permit renewal status (Table 2). We only sampled Canadian residents in the SCS because of the challenges in delivering envelopes and receiving bird-parts across international borders. Hunter selection is biased toward hunters who previously cooperated. This is beneficial because it increases the response rate, the estimate precision, and the cost efficiency of the survey. However, to limit the bias caused by repeatedly re-sampling the same hunters, we removed two-year SCS participants from the sample for at least one year.

Due to changes over time in how permits have been purchased and hunters have been sampled for the survey, the allocation of permitted hunters and survey responses to each zone has also varied through time. Until recently, we sampled hunters based on the province and zone where they purchased their permit, because no information on the location of hunting activity was available at the time of sampling. Hunters who were sampled for the HQS indicated where they did most of their hunting. In general, the zone of purchase was predominantly the same as the zone of harvest. In recent years, many permits have been sold through an online portal where the zone of purchase is not relevant, and so we started asking hunters to indicate at the time of purchase where most of their hunting activity will take place. We then used this information to link a permit record to a sampling zone. In the new model, we included transformation factors to account for the proportion of hunters that were sampled in a different zone than the one in which they hunted. These transformation factors will become less relevant once the transition to online-sold permits is completed and we can link all permits records to hunting zones based on intended location of hunt.

### The New Model

Using the new hierarchical Bayesian integrated model described here, we estimated the mean group-level (e.g., all ducks) harvest of hunters using an over-dispersed, zero-inflated, Poisson distribution, and the data from the same hunter on total days (days spent hunting) as an over-dispersed, zero-truncated Poisson distribution. We used the HQS calendar responses (how many birds were harvested on each day of the season) to estimate the proportions of the total harvest that occurred within each of a series of discrete periods of the season using multinomial distributions. We then portioned this period-specific total harvest across species by integrating with the SCS data on the number of birds of each species harvested within each period using a series of additional multinomial distributions. We corrected for the period-specific portioning because of the known decline in response rates to the SCS over the course of the season. Finally, we summed the period-specific total harvest estimates across periods to estimate the annual harvest of each species. We included in each component of the model (e.g., total harvest, total days, period-proportions of the total harvest, species proportions in a given period) an explicit random-walk time-series sub-model that shares information between sequential years. This time-series component assumes that each of these parameters has some relationship to the same parameter in the previous year, owing to long-term patterns in species abundance, hunter behaviour, and hunting regulations. For example, the estimated mean number of days spent hunting in a given year is a function of the same value in the previous year, plus some random error. These random-effect time-series components allow the model to share some information through time, while still allowing for any shape of year-to-year change, including smooth trends (e.g., declines in overall hunting through time), annual fluctuations (e.g., seasons with very poor weather that may have reduced activity or harvest success), and step-changes (e.g., introduction of new regulations that change harvest). Finally, we transformed the per-hunter mean values for group-level and species-level harvest to population-level estimates of total harvest using information on the total size of the permit-population hunting in a given zone (i.e., the total number of permitted hunters hunting in a given zone). We apply the model separately for the province and zone where harvest-activity took place.

To simplify the following sections in which we detail the specific components of the model, we use ducks as the example group. However, the model is the same for the other two groups of species, geese and murres (in Newfoundland and Labrador Zone-1), that include calendar and species-composition information.

### Total Harvest and Total Days

Each HQS response includes data on the total reported number of ducks harvested by hunter - *h*, in class – *c*, and year – *y*, which we modeled as an over-dispersed, zero-inflated, Poisson distribution:
We modeled the mean of the Poisson distribution with a log-link, as a function of the year-effect (*α*_{y}), an annual class-effect (*β*_{c,y}), and an observation-level hunter-effect (*χ* _{h,c,y,}), plus an offset for the log of the estimated mean number of days reported by the same hunter (*log*(*λ*_{h,c,y})), and a zero-inflation component (*z*_{h,c,y}), which is the outcome of a Bernoulli trial and equals 1 with estimated probability of in year -*y*. The value represents the proportion of active hunters that harvest no ducks in that year, in addition to the zero-harvests expected from the Poisson distribution. This extra parameter modeling zero-harvests accounts for active waterfowl hunters that only hunt one group of waterfowl (e.g., only hunt geese and not ducks), and must be modeled because the survey does not ask hunters to indicate what types of waterfowl they prefer to hunt: geese, ducks, or both.
Each HQS response also includes data on the total reported number of days spent hunting by hunter - *h*, in class – *c*, and year – *y*. We modeled these data on number of days hunting as an over-dispersed, zero-truncated, Poisson distribution:
We modeled the mean of the zero-truncated Poisson (-*λ*_{h,c,y}) with a log-link, as a function of the year-effect (*γ*_{y}), an annual class-effect (*δ*_{c,y}), and an observation-level hunter-effect (*ε*_{h,c,y}).
We modeled the year-effects for both harvest and days using a random-walk, first-difference time-series sub-model. We estimated the year-effects in the first year (*α*_{1} and *γ*_{1}) as fixed-effects with a zero-mean, normally-distributed prior with a variance of 10 (e.g., *α*_{1}∼*N*(0,10)). We estimated the remaining year-effects in year-y as random effects with a mean equal to the year-effect in the previous year and an estimated variance (e.g., ). We set the priors for the variance of the year-effects as weakly informative priors on the standard deviations, following Gelman et al. (2006), using a half t-distribution with mean = 0, variance = 0.5 and degrees of freedom = 50 (*σ*_{α}∼|*t*(0,0.5,50)|). This prior places approximately 95% of the prior density at values < 1.0, but includes a relatively long tail that allows for much larger values, if supported by the data. Given the common scale of parameter estimates in a log-link model such as this one, standard deviation values > 1.0 for random effects are extremely unlikely and so this prior is only very weakly informative (Gelman et al. 2006). We used this weakly informative prior on all sigma values in the model (e.g., *σ*_{δ}, *σ*_{β}, and *σ*_{ε} below).

We fixed the class-effect parameters contributing to the days and harvest components for the class with the largest number of hunters (class-D) at 0 in all years, so that the remaining class-effects were estimated as departures from the largest class (*β*_{1,y} =0 and *δ*_{1,y} = 0). We modeled the class-effects for the remaining classes in each year as normally distributed random-effects with a class-specific hyperparameter mean and estimated variance and . We gave the hyperparameter means (*B*_{c} and *Δ*_{c}) normally-distributed priors with a variance of 10 (e.g., *B*_{c} ∼*N*(0,10)).

We estimated the observation-level hunter-effect parameters for days and harvest as zero-mean, t-distributed random-effects with class-specific variances and degrees of freedom and ). Using the t-distribution to model these over-dispersion effects allows the modeled hunter-level variation to fit heavier tails than a normal distribution (i.e., a greater number of extreme values in the tails of the distribution than predicted by a normal distribution). We used a normal distribution in early versions of the model, but in most zones and years the empirical distributions of these hunter-level effects showed much heavier tails than a normal distribution. These heavy tails capture the influence of particularly active and successful hunters, in particular. We gave priors to the degrees of freedom parameters ( and ) with gamma distributions with shape and scale set to 2 and 0.2 respectively. We estimated the parameters of the zero-inflation for the harvest using a logistic regression sub-model that used a random walk time-series to track changes in these parameters over time. We estimated the logit of the first year zero-inflation parameter as a fixed effect in year-1 with a half-Cauchy prior following Gelman et al. (2008), which is a weakly informative prior with a reasonable scale for logistic regression coefficients. In all subsequent years, the logit of was a function of the value in the previous year, plus random variation.

### Correction Factors to Estimate Number of Active and Successful Hunters

We estimated the binomial probability that a hunter in class-c and year-y actively hunted using the known numbers of HQS respondents who purchased a permit in that year, and the number of respondents who indicated they had > 0 days of hunting . For each class-c, we modeled the series of values for each year using a logistic regression sub-model similar to the one used for the zero-inflation component. In the same way, we estimated the binomial probability that a hunter in class-c and year-y who actively hunted was successful (i.e., harvested > 0 birds, ) using the known numbers of HQS respondents who indicated they had > 0 days of hunting, and the number of HQS respondents who indicated they harvested > 0 birds . For each class-c, we modeled the series of values for each year using a logistic regression sub-model identical to the one used for the proportion that were active. Similarly, we used similar time-series, logistic regression models to correct for inter-zone hunting. We modeled the annual probability that hunters sampled in a given zone would hunt mostly outside that zone and the annual probability that a hunter hunting mostly inside the zone was sampled outside that zone . We derived the data for these two sub-models from a cross tabulation of the zone of hunt and the zone of sampling for all respondents. So for a given zone and year, we modeled the probability that a hunter sampled in that zone would hunt outside the zone as a binomial distribution using the number of HQS respondents that were sampled in the zone and the number of HQS respondents sampled in the zone who hunted outside the zone . We also modeled the annual values of and using a time-series logistic model, with a structure identical to the one used for the other binomial probabilities.

We combined the correction factors for inter-zone hunting, the proportion of active hunters, and the known total number of permits purchased in each year (*N*_{c,y}) to estimate the total number of active hunters in a given class and year (*A*_{c,y}).
We re-scaled all mean values for hunters, classes, and species to population-level totals using the estimated number of active hunters for a given class and year (*A*_{c,y}).

### Derived Totals of All Group-level Harvest and Activity

We calculated the estimated mean number of days hunting waterfowl (*d*_{c,y}) for class-c and year-y as a derived statistic using the exponentiated sums of the relevant parameters, plus some added variance components to account for the asymmetries in the retransformation.
Where the term is an approximation of the half-variance retransformation from the mean of a log-normal distribution to the mean of the normal, accounting for the t-distributed overdispersion term in the model (, where ). This term is an approximation because the variance of a t-distribution is undefined for some values of the degrees of freedom (Link et al. 2020). We used the same approximation suggested by Link et al. 2020, which they derived by estimating the parameters of a fitted regression model. Although this re-transformation is an area of ongoing research, we have found that it generates estimates of total harvest and activity that are comparable to estimates from the old model.

Similarly, we calculated the estimated mean number of birds (e.g., number of waterfowl) harvested by hunters in class-c and year-y (*k*_{c,y}) using both the variance component related to days and a similar variance component for harvest . In this case, we also included the estimated probability of a non-zero harvest to account for the zero-inflation.

### Harvest by Species and by Age and Sex

We divide the estimates of mean total harvest by year and class into species-specific estimates of harvest using data on the seasonal harvest patterns from the HQS calendars (daily estimates of the number of waterfowl harvested for each respondent) and data on the species-composition information collected from the SCS. On the SCS envelopes in which wing and tail fans are received, hunters also provide information on their current year permit numbers as well as the date and location of harvest. These parts are then identified to species, and in most cases, grouped into demographic categories of age and sex by CWS biologists (Carney 1992). The SCS provides more reliable information on the species composition of the harvest in each zone and at different times of year than would be provided by asking hunters what species they harvested (Smith et al. 1974, Ahlers and Miller 2019). Not all parts can be identified to their demographic categories. We used any available information from each part, so for example the total number of parts summed across all demographic groups within a species will not necessarily equal the total number of parts for that species. Similarly, not all surveys complete the calendar portion of the survey that allows us to estimate the distribution of harvest across the season, but those survey responses can still be used to estimate the total harvest and total days. Our approach that uses all available information assumes that missing components (e.g., calendar information or age information for males and females) are missing at random. This assumption is also used in the old model, and it simplifies the new model and allows us to maximize our sample sizes in estimating the various proportions, such as the harvest in each period or sex ratios.

For calculating the species composition, we divided the harvest season into periods to account for the declining response rates to the SCS as the season progresses (e.g., due to response-fatigue, depleted initial envelope supply, etc. Smith et al 1975). The periods are important to account for differences in the phenology of migration and harvest among species, especially in zones where the hunting season is prolonged. The model generates unbiased estimates of the overall species composition across the entire season by combining separate estimates of the species composition in each period with unbiased estimates of the total harvest in these same periods (Cooch et al. 1978). Without this seasonal correction, the harvest of early migrants that are primarily harvested early in the hunting season such as, blue-winged teal (*Spatula discors*) and wood duck (*Aix sponsa*), would be over-estimated while the harvest of species that migrate and are harvested later in the season, such as scaup (*Aythya spp*.) and common eider (*Somateria mollissima*), would be under-estimated. We based the periods for a given zone and harvest group (e.g. ducks) on weekly divisions that included at least 5% of the total submitted parts. To allow the sharing of information on the seasonal patterns in harvest across years, we kept the periods consistent across all years. For ducks, we divided the season into 6 to 13 periods, depending on the length of the season in a given zone, and the distribution of the cumulative harvest across the season. In most cases, the earlier periods in a zone each represented a single week and the later periods may include more than 1 week. The final period often included many weeks, to capture the low level of hunting that continues through the winter.

The model includes three similarly structured sub-models that rely on multinomial distributions to estimate three key proportional distributions: 1) the proportions of annual harvest that occurred in each period using the calendar data; 2) the proportion of the harvest in each period that can be attributed to each species; and 3) the proportion of each species harvest that can be attributed to the age and sex categories (e.g., adult-female, immature male, etc.). We then combined these proportions with the total harvest estimates to estimate the number of birds harvested for each species in each year, as well as the number of birds in each demographic category for a given species.

For the sub-model for the proportional distribution of the harvest across W-periods, we used data on the number of harvested birds reported in each period-w, by hunter-h in year-y (*DUCKS*_{w,h,y}), and the total yearly number of harvest birds reported by the same hunter (*DUCKS*_{h,y}).
We gave Dirichlet-priors to the individual probabilities for each period-w and year-y , and we estimated the parameters of the Dirichlet using a log-link, random-walk time-series sub-model, similar to the parameters of the logistic regression for the binomial probabilities.
In the first year of the time-series, we kept the value for fixed at 0 for the first period and estimated as a fixed-effect for all other periods using a normal prior with mean of 0 and a variance of 10 .

For the sub-model for the proportional distribution of all S-species in period-w and year-y, we used data on the number of submitted parts of each species-s, period-w, in year-y (*PARTS*_{s,w,y}), and the total yearly number of harvest birds submitted in that period (*PARTS* _{w,y}).
We gave Dirichlet-priors to the individual probabilities for each species-s in period-w and year-y .We estimated the parameters of the Dirichlet using a hierarchical, log-link, random-walk time-series sub-model, similar to the sub-model for the period distributions. However, in this case, we included a time-series year-effect (*τ*_{s,w,y}) as well as a random effect for the mean abundance across periods for a given species (*ϕ*_{s, w}).
In the first year of the time-series, we estimated the value for the year-effect component in each period and species as a random effect using a prior with mean = 0 and a variance specific to the period . For the species mean abundance component (*Φ*_{s}), which is the hyperprior for the random species effect by period, we fixed it at 0 for the first species, and we estimated it as a fixed effect for all other species, using a normal prior with mean of 0 and a variance of 10 (*Φ*_{s} ∼*N*(0,10)).

Finally, we gave Dirichlet-priors to the individual probabilities for each demographic group-d (i.e., each combination of the age and sex categories, such as adult-female, immature-male, etc.) and year-y , and we estimated the parameters of the Dirichlet using a log-link, random-walk time-series sub-model, which is identical to the parameters for the distribution of harvest across the periods. In the first year, we set the value for fixed at 0 for the first period and we estimated as a fixed-effect for all other demographic groups using a normal prior with mean of 0 and a variance of 10 .

### Final Estimates of Harvest and Activity

Finally, for the derived estimates, we re-scaled the relevant mean values (e.g., mean harvest of ducks by class-c in year-y, *k*_{c,y}) to a total estimated harvest of ducks by class-c in year-y (*K*_{c,y}) using the estimated number of active hunters in the same class and year (e.g., *K*_{c,y} = *k*_{c,y}, ∗ *A*_{c,y}). We summed these derived estimates of harvest, days, number of successful hunters, etc. across classes, zones, and provinces to generate the full suite of estimates provided to the public in the annual analysis.

Uncertainties for each of the estimates represent summaries of the full posterior distributions. Since we estimated the zone-level analyses independently using the same MCMC process, we summed the zone-level estimates for each posterior-draw of the MCMC, to estimate the full posterior distributions of the provincial and national estimates.

We implemented the MCMC analysis in JAGS (Plummer 2003), run through R (4.0.2), using the package jagsUI (Kellner 2019). We used a burn-in of 5000 iterations, and retained 3000 posterior-draws from 3 independent chains, thinned at a rate of 1/10. We assessed convergence within and across chains by visualizing trace plots and ensuring Rhat statistics for all interpreted parameters were < 1.1 (Gelman and Rubin 1992). We archived all anonymized survey data and code required to run the analyses reported here in a public repository (https://zenodo.org/badge/latestdoi/254251374).

### Model Checking and Model Fit

To assess the fit of the new model to the data, we employed posterior predictive checks within each of the main sub-models (Conn et al. 2018). For example, in developing the sub-model for the total number of ducks harvested, posterior predictions from earlier fitted models that lacked the zero-inflation component generated too few zero-valued harvests, compared to the observed data. Similarly, earlier versions of the new model that used a normal distribution to model the hunter-level over-dispersion effects generated predicted distributions of hunter level effects that did not fit the observed distribution of reported harvests well. In this case, the observed data tended to have many more high harvest estimates than predicted from the normal distribution. As a result, we adjusted the model to use a t-distribution with heavier tails than the normal to allow the model to predict the relatively large number of high harvests. Comprehensive assessments of fit for models that integrate multiple sources of data are complicated, for example any cross-validation approach requires a decision of what constitutes a unit of the data (Zipkin et al. 2021). In our case, there are nine independent response variables that the model predicts (number of birds harvested, number of days spent hunting, number of birds harvested each day of hunting, number of parts for each species, period, and year, number of parts identified to each demographic group within each species, plus four relatively simple counts of the numbers of hunters used to estimate the various binomial probabilities, such as , etc.). We anticipate some interesting comparisons of prediction error between this model and future modifications, but given the fundamental differences in modeling frameworks between the old and new models, we have focused on comparisons of the estimates and not predictions of data.

### Other Species

As part of the HQS, we also collect information on the harvest of other species of migratory birds, including Wilson’s snipe, American woodcock, sandhill crane, mourning dove, American coot, rails, and band-tailed pigeon. We capture this harvest information using a simplified portion of the questionnaire that collects information on the combined harvest effort for all non-waterfowl species and the harvests of each of the above species. In order to estimate the hunting effort and harvest for each of the non-waterfowl species, we used a simplified version of the new model. In this simplified version, we removed the species composition and calendar components, collapsed some of the hunter strata when samples sizes were small, and included an additional data matrix to ensure zero-harvest estimates in years when there was no allowable harvest for a particular species (i.e., a closed season in a particular zone and year). Otherwise, the components of the simplified new model that estimate harvest and activity through time are the same as they are in the new model for waterfowl and murres. We also included the complete code to run this simplified version of the new model in the online repository.

### Complete Estimates for 1976-2019

We have applied the new model to all data from 1976 – 2019 and generated estimates for all species and group-level harvests, all activity measures (e.g., total hunting days), and all age and sex ratios. We archived this full set of updated estimates, and the code to generate them, online (https://zenodo.org/badge/latestdoi/340784801). Annual updates of these estimates will be made available through the Canadian Wildlife Services harvest survey results website (https://wildlife-species.canada.ca/harvest-survey).

## RESULTS

The estimates of harvest derived from this new hierarchical Bayesian integrated model are generally similar to those from the old model, for data-rich species, zones, and years. For example, annual estimates of national harvests for total ducks and total geese are very similar across the entire time-series (Fig. 2), and over the last decade (2010-2019) the range of the annual percent-difference between new and old models includes both positive and negative values for all duck species (Fig. 3). For the species that are harvested commonly in large numbers, the differences between the two models are relatively small (i.e., species near the top of Figure 3). On average over the last decade (2010-2019), annual estimates of mallard (*Anas platyrhynchos*) and American black duck (*Anas rubripes*) harvest at a national level are 3-5% lower with the new model compared to the old model, and estimates for most other species are generally higher (i.e., species near the bottom of Fig. 3). The new model generally reduces the harvest estimates of the most commonly harvested species in a given zone (e.g., mallard or American black duck in most zones) by a small amount, and redistributes that harvest to species that may have been missing from the old model estimates in some years (Fig. 3). The new model also tends to reduce the magnitude of the annual fluctuations for species and zones that are relatively data-sparse, in comparison to the old model (e.g., northern pintail [*Anas acuta*] in Saskatchewan Zone 3; Fig. 4), but the sharing of information through time in the new model has less influence on the estimated annual fluctuations for the most commonly harvested species (e.g., mallard harvest in Saskatchewan Zone 3 and small-bodied Canada goose harvest in Manitoba Zone 1; Fig. 4). Additionally in cases with extremely sparse data, the new model generates non-zero estimates for species that happen to be absent from the species composition survey in a given year (e.g., black scoter [*Melanitta americana*] in Newfoundland and Labrador Zone 2; Fig. 4). The reduced annual fluctuations are particularly apparent for some species of conservation concern (Fig. 5).

The new model estimates are generally more precise than the old model estimates, both for the data-rich species (e.g., mallard and American black duck; Fig.6) and for the relatively data-sparse species (e.g., common eider and black scoter; Fig.6). Averaged across all species of ducks and geese, the coefficients of variation for national harvest estimates are reduced by approximately 10% with the new model. With the new model, 23 species of ducks and geese have average annual national harvest estimates with coefficients of variation less than 10% over the last decade (2010-2019), in comparison with only 6 species using the old model. In addition, the precision of the new model estimates is much more stable through time (i.e., smaller annual fluctuations) and is more precise, relative to the old model, in recent years, when sample sizes are much lower than they were at the beginning of the survey.

With the new model, we also generate formal estimates of age ratios in the harvest, which have estimates of uncertainty and are similarly less sensitive to annual fluctuations (Fig. 7), compared to the raw summaries of parts received from the old model. For example, national-scale age ratios for mallards from the new model are all lower than ratios from the old model (Fig. 8). The new model more accurately weights the contribution of parts from different regions. For example, Ontario and Saskatchewan vary a great deal in both the observed age ratios in the parts and the participation rates in the SCS. Specifically, in Ontario, survey participants have traditionally submitted almost three times as many parts in relation to the total harvest, than have been submitted in western provinces such as Saskatchewan. The new model accounts for the bias in participation rates and generates unbiased estimates of the national age ratios (Fig. 8).

## DISCUSSION

This new model represents an important advance in estimating and managing the harvest of migratory birds. The general benefits of a hierarchical Bayesian framework are well recognized in ecological data analysis (Cressie et al. 2009, Link and Barker 2010, Gelman et al. 2013, Dorazio 2016), and recently in estimating harvest (Arnold 2019), or harvest rate using American black duck band returns (Conroy et al. 2005, Roy et al. 2015). By assuming that many of the key components of the harvest in a given year are similar to that same component in the previous year, the model allows annual estimates to vary through time, but dampens large annual fluctuations due to sampling noise. As a result, the model generates estimates of migratory bird harvest that are less susceptible to sampling error in a given year, particularly for species that occur less frequently in the species composition survey (e.g., Koneff et al. 2017). These less common species include some that have particularly important conservation and management concerns, including harlequin duck (*Histrionicus histrionicus*) and Barrow’s goldeneye (*Bucephala islandica*): two duck species with eastern populations assessed as “special concern” by the Committee on Status of Endangered Wildlife in Canada (COSEWIC 2001 and COSEWIC 2013), but that still appear sporadically in the harvest. By contrast, the traditional analysis would estimate no harvest for less common species if no parts were submitted in a given year, such as all national estimates for harlequin duck since 2012 (Fig. 5). Harvest estimates from the new model are essentially the same for the most commonly harvested species as estimates from the old model that used a series of design-based equations. The new model estimates are generally more precise, which likely derives from the time-series components of the model that share information through time and take advantage of the similarities from one year to the next in the pool of hunters, regional species composition, hunting preferences and behaviours, and regulation packages. The similarity between the new and old model estimates suggests that there is no need to revisit past management decisions, and that these new estimates can integrate seamlessly with existing decision processes for harvest management.

This new model also provides more realistic and practical estimates of uncertainty than the traditional approach. It includes a coherent data-generating model based on the non-negative counts of harvested birds, days spent hunting, etc., and a log-link component that ensures all predictions of harvest are non-negative. The old model used ratio estimators and their standard errors, and so confidence intervals could not be directly estimated (e.g., a naïve interval would include negative numbers of harvested birds for highly uncertain or low-magnitude estimates). The Bayesian framework ensures that all estimates relate to the posterior distributions of the parameters that are directly of interest (i.e., number of harvested birds) and so uncertainty intervals can be interpreted as a range of values that has a high probability of containing the true value (Gelman et al. 2013, Dorazio 2016). In addition, the new model provides a more comprehensive integration of the various sources of uncertainty, because it has explicitly defined distributions for the data and for all parameters and it propagates all uncertainty in those parameters through to all estimates of harvest, age-ratios, species compositions, etc. (Cressie et al. 2009, Gelman et al. 2013). For example, the addition of age and sex specific harvest estimates and associated variation as a standard model output should be a useful feature for integrating these improved estimates, which were previously unavailable, into subsequent analyses such as Lincoln estimates of population sizes (Alisauskas et al. 2014, Arnold 2019).

We see this new model as an initial step in an ongoing evolution of our estimation and understanding of migratory bird harvest in Canada. The hierarchical Bayesian framework provides a clear and flexible way to customise the model, modify the prior assumptions, and add additional sources of information. We suggest it would be useful to explore for potential bias in the estimates of sex-ratios or the distribution of harvest across the season that may be introduced by assuming that missing data are missing at random. Also, small modifications to the priors and distributional structures of this model would allow for stronger assumptions about species composition across years and between adjacent periods within years. Currently, the priors on the time-series components are only weakly informative (Gelman 2008, van de Schoot et al. 2021), but more informative priors are likely warranted, given the deep domain-specific knowledge that exists in the waterfowl harvest management community (Banner et al. 2020). For example, it may be beneficial to include informative priors that share information among selected species based on knowledge about species exposure to similar hunting behaviour, such as informing the harvest of Harlequin duck using information about the harvest of Common eider, or scoter species. This model framework also makes it relatively easy to add informative covariates that might influence hunting behaviour, such as weather conditions during peak hunting periods. Additionally, the spatial stratification provided by the hunting zones is a useful simplification that captures a great deal of the variation in harvests (Cooch et al. 1978). However, an inherently spatial treatment of the data could allow this model to track the spatial variation while also sharing some information among neighbouring regions (e.g. Morris et al. 2019) and further improve the quality of the estimates in some data-sparse regions. For example, murres are harvested in zones 1 and 2 of the province of Newfoundland and Labrador, but currently we are unable to estimate murre harvest for zone 2, because there are not enough parts or survey responses. Efforts are underway to increase response rates for the survey and to generate more data to help fill this and other gaps. With these additional data and by sharing information in a hierarchical framework and assuming that some aspects of the murre harvest are similar between zone 2 and zone 1 of the province, we can further improve our estimates.

One of the main areas for future improvements would be a better understanding of the variation among hunters in the species composition of their harvest. Currently, both the new model and the old model pool all parts contributed in a given zone and period, regardless of which hunter contributed the parts. This simplification has been necessary because of the relatively low participation rate of hunters in the SCS (Cooch et al. 1978), the relatively large number of parts required to estimate species proportions in each period of the hunting season, and the complexities of coherently tracking uncertainties in the old model. However, the new model’s hierarchical Bayesian framework provides a clear way to account for this additional source of variation. Future research could consider adding a sub-model to estimate and account for among-hunter variation. This sub-model could share information among years, seasons, and zones. It could also benefit from potential modifications to the surveys to collect additional information on hunter behaviour, such as identifying hunters that specialize on hunting seaducks or that regularly hunt in the winter. In addition, with future plans to transition the survey instrument from its current hardcopy format to an electronic format, it is hoped that individual hunters can be tracked through time as multiple years of information on each hunter would greatly improve our ability to account for the variation among hunters. Similarly, improved accounting for variations among hunters and this new modeling approach provide a coherent framework to integrate estimates of possible response bias and the uncertainty in those estimates of bias (Padding and Royle 2012).

## MANAGEMENT IMPLICATIONS

With this new model and modeling framework, we have generated improved estimates of migratory game bird harvest, particularly for species that are less commonly harvested. The hierarchical nature of the new model and the time-series components that allow annual variation while sharing information through time make more efficient use of the survey data than the old model that treated each year as independent. The new model also makes it possible to estimate age and sex specific harvests that will facilitate more informed management decisions (e.g., Alisauskas et al. 2014). The Bayesian model propagates uncertainty across all the parameters in a coherent and transparent way, so that management decisions can more fully integrate that uncertainty (Nichols et al. 1995). The Bayesian nature of the new model, and the open-access code in the supplement, provides a coherent framework to incorporate the rich prior knowledge of hunting behaviour and species biology, which will facilitate future improvements and elaborations to fill more specific management information needs (Banner et al. 2020).

*Associate Editor*

## Summary for online Table of Contents

This new model provides improved estimates of the hunting activity and harvest of migratory game birds in Canada, as well as new estimates for age- and sex-specific harvest. The Bayesian framework and the open-source code that we provide here will facilitate ongoing improvements and elaborations of the model.

## Footnotes

Adam C. Smith Canadian Wildlife Service, Environment and Climate Change Canada, National Wildlife Research Centre, 1125 Colonel By Dr. Ottawa, ON, K1A 0H3

Revised manuscript after review