Modeling Joint Abundance of Multiple Species Using Dirichlet Process Random Effects

We present a method for modeling multiple species distributions simultaneously using Dirichlet Process random effects to cluster species into guilds. Guilds are ecological groups of species that behave or react similarly to some environmental conditions. By modeling latent guild structure, we capture the cross-correlations in abundance or occurrence of species over surveys. In addition, ecological information about the community structure is obtained as a byproduct of the model. By clustering species into similar functional groups, prediction uncertainty of community structure at additional sites is reduced over treating each species separately. The method is illustrated with a small simulation demonstration, as well as an analysis of a mesopelagic fish survey from the eastern Bering Sea near Alaska. The simulation data analysis shows that guild membership can be extracted as the differences between groups become larger and if guild differences are small the model naturally collapses all the species into a small number of guilds which increases predictive efficiency by reducing the number of parameters to that which is supported by the data.


INTRODUCTION
In recent years there has been considerable development of methodology for modeling and predicting abundance and occurrence of species of interest. Much of this development uses a hierarchical framework for developing models to fit the complexities of the observed data This paper has been submitted for consideration for publication in Environmetrics Environmetrics D. S. Johnson and E. H. Sinclair or natural abundance processes (Cressie et al., 2009;Royle and Dorazio, 2008;Hobbs and Hooten, 2015). Some of these complexities may include: spatial and temporal dependence (Carroll et al., 2010;Latimer et al., 2009;Johnson et al., 2013b;Thorson et al., 2015;Ward et al., 2010;Thorson et al., 2016), nondetection of individuals at sampled sites (Dorazio and Connor, 2014;Royle, 2004), and zero-inflation (Johnson and Fritz, 2014;Thorson et al., 2016). Many of these species distribution models (SDMs) were used to make inference to a single species or one-at-a-time modeling if community inference was desired. However, by not recognizing the fact that species interact, use of single species models for making inference for community abundance and structure can produce misleading results (Clark et al., 2014). Hence, new joint species distribution models (JSDMs), which explicitly model species interactions (or, cross-correlation) have recently been developed (e.g., Dorazio and Connor, 2014;Latimer et al., 2009;Thorson et al., 2015Thorson et al., , 2016. Herein, we propose a novel JSDM approach which models species interactions through membership in a latent ecological guild (Simberlo↵ and Dayan, 1991) or functional group within the sampled range of habitats.
Typically, description of an abundance model begins with a generalized linear model (GLM) structure for the abundance process using a discrete value distribution such as Poisson or negative-binomial. For example, one might model the abundance as a Poisson observation with log-mean that is a function of covariates. Those covariates might include habitat variables or variables related to the sampling procedure which are thought to be related to the observed abundance. Alternatively, one might log transform the abundance and use Gaussian linear models (Johnson et al., 2013b;Johnson and Fritz, 2014;Ward et al., 2010), but the general mean structure is usually the same. Herein, we will focus on the GLM versions. The focus of the abundance modeling is related to either establishing an ecological relationship between (joint) abundance and the environmental covariates or predicting abundance at unsampled locations.
To extend the single species GLM oriented model to account for interactions of multiple species and improve prediction and inference of community structure and joint abundance, there have been several approaches which di↵er in the details of interaction modeling, but all were placed in the GLM framework by adding random e↵ects which are either directly correlated between species (Clark et al., 2014;Dorazio and Connor, 2014;Latimer et al., 2009) or when marginalized from the (log-linear) model imply a cross-species correlation structure (Thorson et al., 2015(Thorson et al., , 2016. The direct approach of using a free parameter for every pair of species when modeling the species-level correlation has been successfully implemented (Clark et al., 2014;Latimer et al., 2009), however, in those studies there were a high number of sampled sites or a low number of species considered. In other studies, unstructured covariance did not produce reliable results (Dorazio and Connor, 2014). Thus, recent e↵orts to contribute novel methodology for JSDMs have focused on reducing the number of parameters used to model species interactions. Dorazio and Connor (2014) used a known species trait proximity matrix to model the species-level covariance matrix using a spatial correlation function. By using the known information on species similarity there are only two parameters necessary to model the cross-correlation. Another low complexity approach has been proposed (Thorson et al., 2015(Thorson et al., , 2016 using linear combinations of latent random e↵ects. Specifically, the latent e↵ects are spatial fields, but the same methodology could be applied using independent random e↵ects. If the number of random e↵ects is small relative to the number of species modeled, the number of parameters necessary for modeling species cross-correlation can be significantly reduced from the unstructured scenario. As a novel alternative, we propose a JSDM that uses latent ecological guilds to model interactions among species and obtain joint abundance inference. Herein, we also consider joint species occurrence as well, where occurrence is defined as the binary presence (i.e., abundance > 0) or absence (abundance = 0) of a species. Dorazio and Connor (2014) used known guild membership of di↵erent species to model independence of some species in a crosscorrelated JSDM. Simberlo↵ and Dayan (1991) defines an ecological guild to be "a group of species that exploit the same class of environmental resources in a similar way." With this definition in mind, we seek to build a model where species are cross-correlated in abundance because there are unknown group e↵ects for some set of covariates, i.e., if the group (guild) structure was known they could be represented by (group ⇥ covariate) interaction terms in the abundance GLM models. To accomplish this task we format the model as a latent class or mixture model (see McLachlan and Peel, 2004). Mixture models or latent class models are often used to model dependance between variables in a nonparametric fashion because for a su ciently large number of groups, marginalizing over the random latent classes can approximate any dependence structure to whatever degree desired (McLachlan and Peel, 2004;Vermunt et al., 2008). It has been shown that this holds even when the conditional models are independent given group membership (Dunson and Xing, 2009). In an ecological abundance context, finite mixture models have been used in the past to model spatial heterogeneity and correlation in a nonparametric fashion Johnson et al., 2013b). In this paper we take inspiration from nonparametric dependence methods used for spatial association and apply it to species interaction in abundance modeling.
In the following section we describe the general infinite mixture framework using latent groups and describe the Dirichlet Process (DP) for modeling group membership and the number of groups. There are several models choices for number and assignment of latent classes, but we utilize the DP due to its long history and good clustering properties (Casella et al., 2014). Parameter estimation in the DP-JSDM is challenging due to the latent class process. We provide a reversible-jump MCMC (RJMCMC; Green 2003) algorithm for making Bayesian inference. Finally, we apply the method to few simulated data sets, as well as, a real data set on mesopelagic fish communities in the eastern Bering Sea, Alaska.

General model framework
We begin the description of the proposed methods with some notation. First we assume there are J surveys, for which abundance (or count index; hereafter we use the term "counts") of I di↵erent species is measured. Let n ij be the observed count for ith species in survey j. We also use the vector notation n i = (n i1 , . . . , n iJ ) 0 and n = (n 0 1 , . . . , n 0 I ) 0 for the n ij , as well as, other quantities described later. For occurrence modeling we denote occurrence as y ij = 1 if n ij > 0 otherwise y ij = 0. In practice, n ij need not necessarily be observed for occurrence modeling. The notation y i and y are similar to the abundance counterparts. For abundance modeling, there are several possible distributions that could be used to model the observed discrete counts, Poisson, negative binomial, zero-inflated Poisson, etc., so we will generically denote this observation model as [n ij |z ij , ] where z ij is a latent Gaussian variable controlling the level of expected abundance and is a set of, possibly nuisance, parameters. The notation "[A|B]" refers to the conditional distribution of A given B. For example, if a Poisson distribution is considered, and is not necessary. In the example analysis of mesopelagic fish surveys we utilize a zero-inflated Poisson (ZIP) model, so, the additional ij parameter is the mixing probability for the extra zeros. For occurrence modeling we use Environmetrics D. S. Johnson and E. H. Sinclair where (·) is the standard normal CDF, that is, a probit link function.
To account for unknown interspecies correlations we take a clustering approach inspired by the analysis of Johnson et al. (2013b) for incorporating spatial structure when there are no reasonable distance metrics or neighborhood groupings are unknown. The model is constructed by envisioning an unknown partition, p, of the species into  p groups such that species within groups behave similarly with respect to the abundance process. For a given p, we model (in vector form) the latent z process with the linear model where • X is a design matrix of covariates for which there are no group-level e↵ects, • is a vector of regression coe cients, where C p is an I ⇥  p binary matrix indicating which species belong to each group in p and H is a J ⇥ q matrix of q habitat covariates recorded at the jth survey, • p = ( 0 1 , . . . , 0 p ) 0 is a vector of normally distributed random e↵ects, where, [ k |⌦] = N (0, ⌦), for k = 1, . . . ,  p . • ⌃ is a diagonal matrix with entries 2 ij (for occurance modeling ij = 1).
To reduce the complexity of the proposed model we suggest the following for general practice: (i) for abundance models, set = diag(⌃ 1/2 ) = exp{L✓}, where L is a matrix of design covariates and With respect to (i), there are some useful special cases, namely, L = 1 gives ij = and L = I I ⌦ 1 J gives ij = i . However, the overdispersion parameters could also be modeled 6 based on covariates associated with sampling methods, etc. Suggestion (ii) was formulated from the covariances of the g-prior (Tiao and Zellner, 1964). The g-prior, N (0, ! 2 (H 0 H) 1 ), is an often used prior for regression coe cient parameters. It has the nice benefit that, with a single parameter, it automatically controls the scale of variance and covariance for each coe cient based on the scale of the covariates and their cross-correlation. The exponential reparameterization is used for ease of inference, that is ⇠ can be unconstrained.
The previous description assumed that the correct partitioning of the species is known, however, for most real data sets, the correct partition is unknown. Thus, we must also provide a probability model over partitions, [p|↵], such that marginalization over the unknown partitions creates random coe cient vectors that are nonparametric in their distribution.
A commonly used distribution over partitions is the Chinese Restaurant Process (CRP). A construction definition of the CRP is described as follows, for a given parameter ↵ > 0, 1. A customer enters the restaurant and sits at one of an infinite number of tables.
2. The next customer enters and chooses to sit at the occupied  (Neal, 2000). The density function for the CRP cluster model is given by, where g pk is the size of the kth group in p. Note, that the distribution of p is only a function 7 of the number and sizes of the groups. Realizations of p with the same number of groups and groups sizes have the same probability regardless of which individuals fall in which cluster.
The Dirichlet process is connected to the CRP process because a DP process is constructed using the same procedure to seat the guests in the CRP model. Specifically, in terms of (4), let¯ i be the coe cient associated with the ith species, that is¯ i = is the (i, k) entry of the C p matrix. Now, if¯ i follows a DP then, conditionally, where u i is the number of unique values, k , of¯ i 0 i 0 = 1, . . . , i 1, and n k is the number of species 1 through i 1 belonging to group k. In other words, a new table is represented by a new value of k . Thus, the CRP partitioning combined with the realizations for each group implies that [¯ i |↵, ⌦] = DP(↵, ⌦). Like the spatial covariance model use by Dorazio and Connor (2014), the DP-JSDM also marginally possesses generally positive cross-covariance structure. This makes intuitive sense as one is grouping similar species together or, if species are dissimilar, allowing them to be independent. The covariance structure of the DP-JSDM can be derived by forming an intercept random e↵ect, ⌘ = K p p , such that z = X + ⌘ + ✏, where [✏] = N (0, ⌃). Then, conditioning on the cluster assignment, the covariance matrix of the random e↵ect ⌘ is, and the marginal variance is given by the mixture, where is a matrix with (i, i 0 ) entries equal to the probabilities that species i shares a guild with species i 0 . We term the matrix to be the species proximity matrix due to the fact that it forms a distance, of sorts, in the guild space of the species. Although, the covariance is never negative between any two species, it can be zero, thus those species that occupy di↵erent guilds will have uncorrelated ⌘ random e↵ects, i.e., if It should be noted, however, that although the covariance of the ⌘ random e↵ect is generally, positive, that does not mean that there are only 'positive' (or zero) relationships between species. The clustering is based on the relationship each species has with the chosen covariates. For example, one species may react positively along a covariate gradient ( i > 0) and another reacts negatively along that same gradient ( i 0 < 0), therefore if a new site has a high level of this covariate, the first species will be predicted to be relatively abundant, while the other species abundance will be lower.

Bayesian inference
Because of the hierarchical and variable dimensional nature of the parameter space of the DP-JSDM model we employ a Bayesian approach via MCMC (Markov Chain Monte Carlo) for model fitting and inference. The posterior distribution of interest is given by There are several derived parameters which may be of interest for making desired ecological inference. First, are predictions of community abundance at new locations or times. Second, one may be interested in the overall e↵ect of the environmental covariates for a particular species represented by¯ i . Finally, the matrix C p C 0 p is an I ⇥ I indicator that a species is in the same guild (associated with) another species. The posterior mean of C p C 0 p provides estimated guild proximity matrix, . Finally, the number of guilds,  p (number of columns 9 in C p ) may be of interest. The most direct way to make inferences on the proposed hierarchical clustering model is through a reversible-jump Markov chain Monte Carlo (RJMCMC) algorithm (Green, 2003) to sample the posterior distribution of the parameters and clustering assignment. Here, we provide an overview of the RJMCMC, additional details of the sampler are given in Supplementary Material A.
In our description, we will assume the following prior distributions for the parameters: where I p is an identity matrix of size  p , Q is a known positive-definite matrix, HT ( , d) represents a scaled half-t distribution with scale parameter and d degrees of freedom, and G represents a gamma distribution with parameters a and b. For most of these parameters, the priors can be adjusted to whatever distribution the user would like, the trade-o↵ being a Metropolis-Hastings (MH) update instead of a Gibbs step (e.g., for ) or no di↵erence at all if the parameter has to be updated with an MH step to begin with (!, , and ↵). However, the normal [ p |!, p] prior is necessary to the proposed RJMCMC algorithm. Although, the known Q is not necessary. This is not as critical as it sounds as the marginal distribution is still a nonparametric DP process we just require that the base distribution be a multivariate normal.
The majority of the proposed RJMCMC algorithm is a standard Metropolis-within-Gibbs (hybrid) sampler for a GLM-like model. Conditioned on a realization of p, all the other parameters can be updated with a traditional MH step or a Gibbs step. Hence, we do not focus on their updates here (see Supplementary Material A). However, to update p, the dimension of the p vector will potentially change, necessitating the trans-dimensional aspect of the RJMCMC. Naively, the trans-dimensional moves require a joint (p, p ) proposal which can be rejected often if one of those quantities is a bad fit for the current state of the remaining parameters even though the other is acceptable. Second, proposing new p such that the MCMC chain will mix well over the space of partitions is itself challenging.
Because we are assuming that [z| , p , ] and [ p |!, p] are multivariate normal, the first problem can be handled with the partial-analytic RJMCMC method proposed by Godsill (2001) and utilized by Johnson and Hoeting (2011) and Johnson et al. (2013b) in similar trans-dimensional MCMC applications. The partial-analytic method allows proposal of a new model (p in this case) without jointly proposing the associated model specific parameters ( p ) because they can be analytically marginalized. This is a special case of a collapsed Gibbs sampler (Van Dyk and Park, 2008).
To produce e cient moves through guild space we use the the "individual links" definition of the the CRP process proposed by Blei and Frazier (2011) and subsequently used by Johnson et al. (2013b) for clustering spatial abundance trends. The links version of the CRP process is constructed as follows: 1. A customer enters the restaurant and sits at one of an infinite number of tables.
2. The next customer enters and chooses to sit with the first customer with probability 1/(1 + ↵) or a new table with probability ↵/(1 + ↵).
3. In general, upon entering the restaurant, the i + 1 customer sits with a previous customer (not a table) with probability proportional to 1 or the new customer sits by himself (self-links) with probability proportional to ↵.
4. Groups are constructed by collecting all cliques of the mathematical graph formed by the links between customers. Blei and Frazier (2011) show that this definition of the CRP process is equivalent to the traditional definition presented previously. However, MCMC sampling is now based on sampling independent links between individuals. In terms of the multiple species model, let`i 2 {1, . . . , I} be the link for the ith species. The full conditional distribution of`i is, where p is the partition constructed from all`i and , and 1 {·} is an indicator function for the condition in the brackets. It would be tempting to sample from this discrete distribution in Gibbs fashion, however, note that it depends on p which may be of di↵erent dimension under a di↵erent value of`i. We can collapse over p and use the marginal distribution which does not depend on p . This approach was used by Hoeting (2011) andJohnson et al. (2013b) exactly as described, however, we found that for a large number of species and samples, the covariance matrix K p (I p ⌦ ! 2 Q)K 0 p + ⌃ may be quite large and the inversion necessary to evaluate the [`i|z, , , !, ↵] for each species and potential link would make the chain prohibitively slow in practice. So, we sought an alternative formulation of the marginal distribution that did not require inversion of such a large covariance matrix.
Using Laplace's method (see Kass and Raftery 1995, Section 4 , which are respectively the inverse covariance and mean for the Gaussian full conditional distribution , , !, p]. This is the same distribution used to update p with a Gibbs step following an update of p. Normally, Laplace's method produces an approximation to the integral, but in this case the approximation is exact because the log integrand is quadratic in p (Goutis and Casella, 1999). By writing the integral in this way we need only invert ⌃, which is diagonal, and Q because (I p ⌦ ! 2 Q) 1 = I p ⌦ ! 2 Q 1 . If we use Q = (H 0 H) 1 as previously suggested, then the inverse is trivial. Because, [`i|z, , , !, ↵] is relatively cheap to evaluate for each`i we can use a Gibbs step and draw from the discrete distribution [`i|z, , , !, ↵] for each i = 1, . . . , I, with [z| , , !, p] evaluated using (13) instead of (12).

A SIMULATION PROOF-OF-CONCEPT
To examine the ability of the DP-JSDM model to make inference to species interaction, as well as, to make community abundance predictions, we tested the model and RJMCMC sampler with a small group of simulated data sets. In analyzing the simulated data our objective was to assess whether the DP-JDSM model would, in practice, produce generally correct estimates of the guild structure. Second, would the DP-JSDM exhibit the expected behavior that as ! becomes small, the number of guilds (groups) estimated will go to one as the functional di↵erences between the guilds (with respect to the variables in H) becomes insignificant.

Simulation and Analysis
Data were simulated for I = 20 species, J = 35 samples, and  p = 5 groups. Six data sets were simulated corresponding to ! equal to 0.25, 0.5, 0.75, 1, 1.5, and 2. While the true number of groups is always technically equal to five, the practical di↵erences between the groups tends to zero as ! becomes smaller. The group sizes were g pk = 7, 5, 4, 3, and 1. Three environmental variables composing the guild design matrix H were generated from a standard normal distribution. In addition, a single survey e↵ort variable, x was generated to adjust overall abundance measurement. The global design matrix was set to that is, H matrix is concatenated I times over species. Thus, p denotes guild di↵erences from the overall global e↵ect of the environmental variables, H. In order to maintain identifiability, we imposed the constraint that P p k=1 k = 0. The global coe cient was set to = (2, 1, 0, 1, 0.5) 0 and each k ; k = 1, . . . , 5, was drawn from N (0, ! 2 H 0 H). In these simulations all ij = 0, therefore, z ⌘ X + K p p . However, a common was estimated in each analysis using a Poisson observation model, that is, [n ij |z ij ] = Poisson(e z ij ). The prior distributions used were the same as specified in Section 2.2, specifically, • [ ]: µ = (μ 0 , 0, 0, 0) 0 andμ 0 is the log of the mean observed count and ⌃ = 100(X 0 X) 1 . For each of the six simulated datasets, we sampled the posterior distribution (9) using the RJMCMC algorithm detailed in Supplementary Material A. Each sample consisted of 50,000 iterations following a burnin of 10,000 iterations. We created the multAbund † package for the R statistical environment (R Development Core Team, 2015) which contains the code to run the RJMCMC algorithm described in Supplementary Material A.

Simulation results
As expected, when ! became small the DP-JSDM model was not able to distinguish guild di↵erences between the species and essentially estimated one single group ( Figure   1; ! = 0.25).
[ Figure 1 about here.] As ! increased and guild di↵erences became apparent the model was able to separate the species into their respective guilds reasonably well (Figure 1). In addition, as ! became large the precision with which the number of guilds was estimated increased as well (Figure 2).
[ Figure 2 about here.] There may be some bias as a few of the simulation runs produced p = 6 ( Figure 2; ! = 1 and 2), however, a full simulation experiment would be necessary to assess that fact. Even though we strived to create an e cient RJMCMC algorithm, it is still somewhat computationally intensive.

Data
In our next demonstration of the DP-JSDM we analyze community structure and abundance of fishes that migrate diurnally between three mesopelagic depths in the eastern Bering Sea † Available from github at: https://github.com/dsjohnson/multAbund. The package can be installed from within an R session using the devtools package, but users need to be able to compile source code on their platform as the multAbund package uses C++ code in its routines.
near Alaska. The tendency for most mesopelagic species to vertically migrate makes them an important trophic link between the deep scattering layer and upper surface waters (Sinclair et al., 2015) yet, fundamental aspects of multi-species distributions and relative abundances have not been previously described.
The field e↵ort identified three primary sample stations over highly productive areas of the eastern Bering Sea pelagic (Figure 3). Here we will demonstrate the DP-JSDM using I = 20 of the relatively most common fish species (as opposed to squids, etc.). Many of the species were extremely rare in the survey e↵ort (i.e., one individual observed over the entire study) and were removed.
The variables we put in the H design matrix reflect the belief that the species segregate into guilds based on diurnal vertical migration characteristics. So, the guild covariates recorded for each trawl are daylight cycle (day or night) and depth category (250, 500, or 1000 m). Here we used the full interaction model to define the H design matrix (i.e., '⇠ cycle*depth' in R language model syntax). Because the duration of the trawl varied from survey to survey, the duration was included in the X matrix to model the overall abundance of fish caught in the trawl.

Model and analysis
Initial attempts at fitting a DP-JSDM proceeded in the same manner as the analysis of the simulation data in the previous section. Namely, we used the same Poisson model for the observed abundance counts. However, after initial fittings it became evident that the trawl data set possessed a significant level of zero-inflation relative to the Poisson distribution. This is likely due to the spatial patchiness of pelagic fish occurrence distributions (Benoit-Bird and Au, 2003). In addition, there may also be detection issues in the survey such that a zero count in the trawl does not necessarily mean absence of the species. However, unlike Dorazio and Connor (2014) where 1 {n ij =0} is an indicator of a zero count and i is a species-specific zero-inflation mixture. We used the prior distribution, with scale parameter = 1.5 and degrees of freedom d = 6. This t distributed prior implies a prior distribution for i that is approximately uniform over (0,1). For the remaining parameters we used the same prior specification as the simulated data analysis of Section 3.1.
To assess if there is any improvement gained by using the DP-JSDM we also fitted the 'independent species' JSDM, that is  p = I, to the data. The JSDM we fitted was did not truly treat each species independently because there are shared terms in the X design matrix (i.e., trawl duration) but it allows us to assess improvement in classifying animals into functional guilds relative to cycle and depth over treating them separately. To ascertain the magnitude of improvement we would have liked to be able to use the 'leave one out' Bayesian predictive information criterion (BPIC) given by where n (i,j) is a vector of all observed data except n ij and log[n ij |n (i,j) , , , p , p, , !, ↵] is the log posterior predictive density for the (i, j)th observation. However, it would be computationally infeasible to rerun the RJMCMC for every left out (i, j) entry. So, we used the 'Widely Applicable Information Criterion' (WAIC; Watanabe (2013)) as an approximation (Watanabe, 2010;Link and Sauer, 2016) The WAIC requires only one run of the RJMCMC with the full data set. There are also other selection methods applicable, , however, we found WAIC straightforward to implement for the DP-JSDM.
The model was fitted using the R package multAbund. The RJMCMC algorithm was run for 100,000 iterations following a burnin of 10,000 iterations. The package contains code to fit the Poisson abundance data model as well as the ZIP and Bernoulli probit model for occurrence. In addition to the joint analysis of abundance, we also analyzed the trawl survey data as an occurrence data set where y ij = 1 if n ij > 0, else y ij = 0. The occurrence analysis results are presented in Supplementary Material C.

Results
After fitting the ZIP version of the DP-JSDM and the independent species JSDM we noted there was a substantial improvement in WAIC under the DP-JSDM. WAIC for the DP-JSDM Then by virtue of guild membership, the model described distribution patterns of species for which there is little reported data (i.e., myctophids, Stenobrachias leucopsarus and Diaphus theta).
For instance, L. schmidti and S. leucopsarus formed the tightest cluster in both abundance and occurrence dendograms (Figures 5 and C.2). Each is the most abundant species within their respective families in the Bering Sea (Brodeur et al., 1999;Sinclair et al., 1999) and both were highly represented throughout the water column day and night in this study.
Guild membership with L. schmidti suggests that S. leucopsarus shares a similar life history and foraging strategy wherein juveniles and adults have indistinct vertical migration and are stratified in the water column according to age (size) with adults remaining below 240 m (Beamish et al., 1999;Mecklenburg et al., 2002).
The bathylagid L. ochotensis and myctophid D. theta also form a guild in abundance ( Figure 5) along with Stenobrachias nannochir in occurrence guilds ( Figure C.2). Lipolagus ochotensis and S. nannochir are among the most abundant mesopelagic species in the Bering Sea Mecklenburg et al., 2002). Both are size-stratified by depth with adults residing in the deepest layers and especially present between 500-1000 m (Mecklenburg et al., 2002). As a strong vertical migrator, L. ochotensis is also abundant between 200-500 m Mecklenburg et al., 2002). Little is known about D. theta from directed catch in the Bering Sea, however guild identity with S. nannochir and especially with L. ochotensis suggests they share similar patterns of behavior. The model implication that D. theta is an age-stratified strong vertical migrator available at upper mesopelagic depths (Figure 6, B.1, and C.3) is supported by observations that it is a primary prey item of Dall's porpoise (Phocoenoides dalli) in the top 250 m of water column (Crawford, 1981).
The best example of the degree of fine detail captured by the model was demonstrated by Bathylagus pacificus, a common and abundant species of Bathylagidae that formed its own cluster ( Figure 5). Like other members of its family B. pacificus demonstrates a bimodal pattern in body size at depth (Peden et al., 1985;Mecklenburg et al., 2002). In our study, juvenile fish were concentrated at mid-layer levels during the day (500 meters) rising to 250 meters at night, while adults concentrate at deeper daytime layers (1000 m) rising to 500 m at night (Sinclair and Stabeno, 2002). This vertical migratory movement is apparent in the log abundance plots (Figure 6; and¯ i values in Figure B.1) that together with known age distribution suggest B. pacificus may form its own guild based on abundances at depth driven by varying foraging requirements of juvenile and adults.

DISCUSSION
We presented a new methodology for modeling joint species distributions based on Dirichlet process random e↵ects to model species associations through a latent guild structure. Instead of trying to directly parameterize cross-correlation in a species-specific random e↵ect, we used latent membership in an ecological guild. Species belonging to the same guild followed the same response to environmental conditions through random coe cients e↵ects in a GLM-like setting. Unlike simple cross-correlated species random intercepts, the DP-JSDM provides some valuable information on which species belong to guilds together and for the species within a guild, how they respond to the selected environmental conditions together.
A fundamental aspect of mesopelagic ecology is diel vertical migration. The DP-JSDM successfully identified community structure among 20 species of fish from the eastern Bering Sea within this framework. The selected model parameters of depth and light describe realtime clusters of species that move together similarly through the water column on a 24 hour cycle, presumably in relation to foraging. Based on studies conducted in the North Pacific Ocean, the diets of many of these same species collected from di↵erent depths match vertical distribution patterns of the variety of copepods and euphausiids that they consume (Beamish 21 Environmetrics D. S. Johnson and E. H. Sinclair et al., 1999).
Although the DP-JSDM model was initially designed to model species association, it has the added benefit that it automatically adjusts to the necessary complexity because the number of guilds is also simultaneously being estimated as well. In the simulation experiment it was demonstrated that if there is little di↵erence between the species in their response to the recorded environmental conditions the DP-JSDM will collapse to one guild, that is, no statistical di↵erence between the species. This reduction in model complexity was noted by Johnson et al. (2013b) in reference to spatially clustering abundance trends. In as the observation portion of the model, whereñ ij is the observed abundance of species i at site j during survey k and ijk is the probability of each of the n ij individuals being observed. If one marginalizes over the true abundances, the Poisson observation model results, where E[n ijk ] = exp{log ijk + z ij }. The same approach could also be used for occurrence modeling, in which case, it becomes occupancy modeling, that is, for the observed presencẽ y ijk , we use the hierarchical observation model, where the probability thatỹ ijk = 1 is y ij ijk . The main point being that the process model does not change in either of these two settings, so, the DP-JSDM can easily be adapted to these situations.
There is also an alteration that can be made when many sites are visited and spatial correlation between sights might also be a consideration. We are not calling this an extension, because spatial correlation can be added without making additions to the basic structure presented. All that needs to be changed to add random spatial e↵ects is to use the basis function approach of Ver Hoef and Jansen (2014), Johnson et al. (2013a), or Hefley et al. (2016. In a spatial basis function model, the random spatial field is modeled as ⌘ = H where the columns of the matrix H contain the spatial basis functions evaluated at each of the modeled sites (rows). Each basis column represents a di↵erent frequency. In the notation just presented it should be fairly obvious how the DP-JSDM can be changed to contain spatial correlation, one simply needs to use a basis function matrix for the environmental design matrix. In that case, it might be appropriate to use [ |!] = N (0, ! 2 I) for the DP baseline distribution to match prior specifications that are usually used in spatial analysis.
And, of course, one could combine the spatial model with the previously mentioned detection model extensions to form mutivariate spatial models for occupancy and abundance modeling.

Prior distributions
Here we describe the details for performing the necessary parameter updates in the RJMCMC algorithm. To facilitate the description the reader should recall we use the following prior distributions in full vector form (where appropriate): where T denotes a t distribution, N is a (multivariate) normal distribution, HT is a half-t distribution, CRP is the Chinese restaurant process, and G is a gamma distribution. Now, we can describe the Markov Chain Monte Carlo (MCMC) sampler. The sampler is constructed from repeated draws from the full conditional posterior distributions. We use the notation [x|·] to represent the conditional distribution of the variable 'x' given all of the other model components.

Updating z
We will first describe the updating of z for the abundance models. Unfortunately, for the abundance models used in this paper (e.g., [n ij |z ij , ] = ZIP or Poisson), the full conditional distribution does not exist in a nice closed form and we suspect this is the case for every abundance model one may want to use. The full conditional distribution required for the update is, for which a Metropolis-Hastings (MH) step is used with a random walk proposal distribution [z ⇤ |z] = N (z, R z ), where R z is a diagonal matrix that is tuned for optimal sampling. In the R package multAbund we use the adaptive random walk proposal described by Shaby and Wells (2011) that continually adjusts proposal distribution throughout the MCMC run.
Once the new z ⇤ is drawn, each z ⇤ ij is accepted with probability .
(A.2) Note, that even though z ⇤ is proposed as a vector, the independence of each element implies that each z ⇤ ij can be accepted or rejected independently. If one is analyzing occurrence data with a probit link as described in the main text of the paper, then the full conditional distribution, is available in closed form. For each (i, j), the necessary full conditional distribution is a ij is a truncated normal distribution with lower bound ( 1 for y ij = 0 0 for y ij = 1 (A.5) and upper bound b ij = ( 0 for y ij = 0 1 for y ij = 1 (A.6) (Albert and Chib, 1993). If another link function is used, then the same procedure as the abundance model updates is used with a MH acceptance step.

Udating
Here, the only model used where was present is the ZIP model used in the analysis of the fish survey data. Therefore, we only describe updating of this parameter with respect to the ZIP model with species-specific ZIP parameters, i . The full conditional distribution of logit i is As with the z updates, the adaptive random walk MH update N (logit i , R ) was used were R is continually adapted through the RJMCMC.

Updating ! and
Using an HT family of priors is not directly conjugate, therefore, a MH step is used here as well. Recall that here we are using ⌦ = ! 2 (H 0 H) 1 and ⌃ = 2 I, where ! = exp(⇠) and = exp(✓). These choices could be easily modified if desired. For !, the full conditional distribution is given by when converting to the log parameterization, we obtain the full conditional for ⇠, As in the z updates, we use a normal random-walk proposal [⇠ ⇤ |·] = N (⇠, R ⇠ ), where R ⇠ is adaptively tuned throughout the MCMC run in the way as the z updates. With regards to , the ✓ parameter is updated in an identical fashion with the full conditional distribution given by [✓|·] / N (z|X + K p p , e ✓ I) · HT (e ✓ | , d ) · ✓ (A.10) and adaptive random walk proposal distribution N (✓ ⇤ |✓, R ✓ ).
6 Updating p and ↵ The update of p was described in the main portion of the paper, therefore we omit it here and refer the reader to Section 2.2 for details. The CRP parameter ↵ is updated through an MH step with the previously described adaptive random walk proposal on log ↵. The full conditional distribution is given by [↵|·] / CRP(p|↵) · G(↵|a, b). (A.11) However, as with all of the positive valued parameters, we choose to reparameterize to the log scale to make use of the adaptive random walk proposal distribution. So, the full conditional distribution for log ↵ is The same adaptive procedure was used with an MH acceptance step to sample the full conditional distribution.