## ABSTRACT

Real geography is continuous, but standard models in population genetics are based on discrete, well-mixed populations. As a result many methods of analyzing genetic data assume that samples are a random draw from a well-mixed population, but are applied to clustered samples from populations that are structured clinally over space. Here we use simulations of populations living in continuous geography to study the impacts of dispersal and sampling strategy on population genetic summary statistics, demographic inference, and genome-wide association studies. We find that most common summary statistics have distributions that differ substantially from that seen in well-mixed populations, especially when Wright’s neighborhood size is less than 100 and sampling is spatially clustered. The combination of low dispersal and clustered sampling causes demographic inference from the site frequency spectrum to infer more turbulent demographic histories, but averaged results across multiple simulations were surprisingly robust to isolation by distance. We also show that the combination of spatially autocorrelated environments and limited dispersal causes genome-wide association studies to identify spurious signals of genetic association with purely environmentally determined phenotypes, and that this bias is only partially corrected by regressing out principal components of ancestry. Last, we discuss the relevance of our simulation results for inference from genetic variation in real organisms.

## Introduction

The inescapable reality that biological organisms live, move, and reproduce in continuous geography is usually omitted from population genetic models. However, mates tend to live near to one another and to their offspring, leading to a positive correlation between genetic differentiation and geographic distance. This pattern of “isolation by distance” (Wright 1943) is one of the most widely replicated empirical findings in population genetics (Aguillon *et al*. 2017; Jay *et al*. 2012; Sharbel *et al*. 2000). Despite a long history of analytical work describing the genetics of populations distributed across continuous geography (e.g., Wright (1943); Rousset (1997); Barton *et al*. (2002, 2010); Ringbauer *et al*. (2017); Robledo-Arnuncio and Rousset (2010)), much modern work still describes geographic structure as a set of discrete populations connected by migration (e.g., Wright 1931; Epperson 2003; Rousset and Leblois 2011; Shirk and Cushman 2014; Lundgren and Ralph 2018). For this reason, most population genetics statistics are interpreted with reference to discrete, well-mixed populations, and most empirical papers analyze variation within clusters of genetic variation inferred by programs like *STRUCTURE* (Pritchard *et al*. 2000) with methods that assume these are randomly mating units.

The assumption that populations are “well-mixed” has important implications for down-stream inference of selection and demography. Methods based on the coalescent (Kingman 1982; Wakeley 2009) assume that the sampled individuals are a random draw from a well-mixed population that is much larger than the sample (Wakeley and Takahashi 2003). The key assumption is that the individuals of each generation are *exchangeable*, so that there is no correlation between the fate or fecundity of a parent and that of their offspring (Huillet and Möhle 2011). If dispersal or mate selection is limited by geographic proximity, this assumption can be violated in many ways. For instance, if mean viability or fecundity is spatially autocorrelated, then limited geographic dispersal will lead to parent–offspring correlations. Furthermore, nearby individuals will be more closely related than an average random pair, so drawing multiple samples from the same area of the landscape will represent a biased sample of the genetic variation present in the whole population (Städler *et al*. 2009).

Two areas in which spatial structure may be particularly important are demographic inference and genome-wide association studies (GWAS). Previous work has found that discrete population structure can create false signatures of population bottlenecks when attempting to infer demographic histories from microsatellite variation (Chikhi *et al*. 2010), statistics summarizing the site frequency spectrum (SFS) (Ptak and Przeworski 2002; Städler *et al*. 2009; St. Onge *et al*. 2012), or runs of homozygosity in a single individual (Mazet *et al*. 2015). The increasing availability of whole-genome data has led to the development of many methods that attempt to infer detailed trajectories of population sizes through time based on a variety of summaries of genetic data (Liu and Fu 2015; Schiffels and Durbin 2014; Sheehan *et al*. 2013; Terhorst *et al*. 2016). Because all of these methods assume that the populations being modeled are approximately randomly mating, they are likely affected by spatial biases in the genealogy of sampled individuals (Wakeley 1999), which may lead to incorrect inference of population changes over time (Mazet *et al*. 2015). However, previous investigations of these effects have focused on discrete rather than continuous space models, and the level of isolation by distance at which inference of population size trajectories become biased by structure is not well known. Here we test how two methods suitable for use with large samples of individuals – stairwayplot (Liu and Fu 2015) and SMC++ (Terhorst *et al*. 2016) – perform when applied to populations evolving in continuous space with varying sampling strategies and levels of dispersal.

Spatial structure is also a major challenge for interpreting the results of genome-wide association studies (GWAS). This is because many phenotypes of interest have strong geographic differences due to the (nongenetic) influence of environmental or socioeconomic factors, which can therefore show spurious correlations with spatially patterned allele frequencies (Bulik-Sullivan *et al*. 2015; Mathieson and McVean 2012). Indeed, two recent studies found that previous evidence of polygenic selection on human height in Europe was confounded by subtle population structure (Sohail *et al*. 2018; Berg *et al*. 2018), suggesting that existing methods to correct for population structure in GWAS are insufficient. However we have little quantitative idea of the population and environmental parameters that can be expected to lead to biases in GWAS.

Last, some of the most basic tools of population genetics are summary statistics like *F _{IS}* and Tajima’s

*D*, which are often interpreted as reflecting the influence of selection or demography on sampled populations (Tajima 1989). Statistics like Tajima’s

*D*are essentially summaries of the site frequency spectrum, which itself reflects variation in branch lengths and tree structure of the underlying genealogies of sampled individuals. Geographically limited mate choice distorts the distribution of these genealogies (Maruyama 1972; Wakeley 1999), which can affect the value of Tajima’s

*D*(Städler

*et al*. 2009). Similarly, the distribution of tract lengths of identity by state among individuals contains information about not only historical demography (Harris and Nielsen 2013; Ralph and Coop 2013) and selection (Garud

*et al*. 2015), but also dispersal and mate choice (Ringbauer

*et al*. 2017; Baharian

*et al*. 2016). We are particularly keen to examine how such summaries will be affected by models that incorporate continuous space, both to evaluate the assumptions underlying existing methods and to identify where the most promising signals of geography lie.

To study this, we have implemented an individual-based model in continuous geography that incorporates overlapping generations, local dispersal of offspring, and density-dependent survival. We simulate chromosome-scale genomic data in tens of thousands of individuals from parameter regimes relevant to common subjects of population genetic investigation such as humans and *Drosophila*, and output the full genealogy and recombination history of all final-generation individuals. We use these simulations to test how sampling strategy interacts with geographic population structure to cause systematic variation in population genetic summary statistics typically analyzed assuming discrete population models. We then examine how the fine-scale spatial structures occurring under limited dispersal impact demographic inference from the site frequency spectrum. Last, we examine the impacts of continuous geography on genome-wide association studies (GWAS) and identify regions of parameter space under which the results from GWAS may be misleading.

## Materials and Methods

### Modeling Evolution in Continuous Space

The degree to which genetic relationships are geographically correlated depends on the chance that two geographically nearby individuals are close relatives – in modern terms, by the tension between migration (the chance that one is descended from a distant location) and coalescence (the chance that they share a parent). A key early observation by Wright (Wright 1946) is that this balance is often nicely summarized by the “neighborhood size”, defined to be *N _{W}* = 4

*πρσ*

^{2}, where

*σ*is the mean parent–offspring distance and

*ρ*is population density. This can be thought of as proportional to the average number of potential mates for an individual (those within distance 2

*σ*), or the number of potential parents of a randomly chosen individual. Empirical estimates of neighborhood size vary hugely across species – even in human populations, estimates range from 40 to over 5,000 depending on the population and method of estimation (Table 1).

The first approach to modeling continuously distributed populations was to endow individuals in a Wright-Fisher model with locations in continuous space. However, since the total size of the population is constrained, this introduces interactions between arbitrarily distant individuals, which (aside from being implausible) was shown by Felsenstein (1975) to eventually lead to unrealistic population clumping if the range is sufficiently large. Another method for modeling spatial populations is to assume the existence of a grid of discrete randomly mating populations connected by migration, thus enforcing regular population density by edict. Among many other important results drawn from this class of “lattice” or “stepping stone” models, Rousset (1997) showed that the slope of the linear regression of genetic differentiation (*F _{ST}*) against the logarithm of spatial distance is an estimate of neighborhood size. Although these grid models may be good approximations of continuous geography in many situations, they do not model demographic fluctuations, and limit investigation of spatial structure below the level of the deme, assumptions whose impacts are unknown. An alternative method for dealing with continuous geography is a new class of coalescent models, the Spatial Lambda Fleming-Viot models (Barton

*et al*. 2010; Kelleher

*et al*. 2014).

To avoid questionable assumptions, we here used forward-time, individual-based simulations. By scaling the probability of survival in each timestep to local population density we shift reproductive output towards low-density regions, which prevents populations from clustering. Such models have been used extensively in ecological modeling but rarely in population genetics, where to our knowledge previous implementations of continuous space models have focused on a small number of genetic loci, which limits the ability to investigate the impacts of continuous space on genome-wide genetic variation as is now routinely sampled from real organisms. By simulating chromosome-scale sequence alignments and complete population histories we are able to treat our simulations as real populations and replicate the sampling designs and analyses commonly conducted on real genomic data.

### A Forward-Time Model of Evolution in Continuous Space

We simulated populations using the non-Wright-Fisher module in the program SLiM v3.1 (Haller and Messer 2019). Each time step consists of three stages: reproduction, dispersal, and mortality. To reduce the parameter space we use the same parameter, denoted *σ*, to modulate the spatial scale of interactions at all three stages by adjusting the standard deviation of the corresponding Gaussian functions. As in previous work (Wright 1943; Ringbauer *et al*. 2017), *o* is equal to the mean parent-offspring distance.

At the beginning of the simulation individuals are distributed uniformly at random on a continuous, square landscape. Individuals are hermaphroditic, and each time step, each produces a Poisson number of offspring with mean 1/*L* where *L* is the expected lifespan. Offspring disperse a Gaussian-distributed distance away from the parent with mean zero and standard deviation *σ* in both the *x* and *y* coordinates. Each offspring is produced with a mate selected randomly from those within distance 3*σ*, with probability of choosing a neighbor at distance *x* proportional to exp(−*x*^{2}/2*σ*^{2}).

To maintain a stable population, mortality increases with local population density. To do this we say that individuals at distance *d* have a competitive interaction with strength *g*(*d*), where *g* is the Gaussian density with mean zero and standard deviation *σ*. Then, the sum of all competitive interactions with individual *i* is *n _{i}* = ∑

*(*

_{j}g*d*), where

_{ij}*d*is the distance between individuals

_{ij}*i*and

*j*and the sum is over all neighbors within distance 3

*σ*. Since

*g*is a probability density,

*n*is an estimate of the number of nearby individuals per unit area. Then, given a per-unit carrying capacity

_{i}*K*, the probability of survival until the next time step for individual

*i*is We chose this functional form so that the equilibrium population density per unit area is around

*K*, and the mean lifetime is around

*L*.

An important step in creating any spatial model is dealing with range edges. Because local population density is used to model competition, edge or corner populations can be assigned artificially high fitness values because they lack neighbors within their interaction radius but outside the bounds of the simulation. We approximate a decline in habitat suitability near edges by decreasing the probability of survival proportional to the square root of distance to edges in units of *σ*. The final probability of survival for individual *i* is then
where *x _{i}* and

*y*are the spatial coordinates of individual

_{i}*i*, and

*W*is the width (and height) of the square habitat. This buffer roughly counteracts the increase in fitness individuals close to the edge would otherwise have.

To isolate spatial effects from other components of the model such as overlapping gener-ations, increased variance in reproductive success, and density-dependent fitness, we also implemented simulations identical to those above except that mates are selected uniformly at random from the population, and offspring disperse to a uniform random location on the landscape. We refer to this model as the “random mating” model, in contrast to the first, “spatial” model.

We stored the full genealogy and recombination history of final-generation individuals as tree sequences (Kelleher *et al*. 2018), as implemented in SLiM (Haller *et al*. 2019). Scripts for figures and analyses are available at https://github.com/petrelharp/spaceness.

We ran 400 simulations for the spatial and random-mating models on a square landscape of width *W* = 50 with per-unit carrying capacity *K* = 5 (census *N* ≈ 10, 000), average lifetime *L* = 4, genome size = 10^{8}, recombination rate = 10^{−}^{9}, and drawing *σ* values from a uniform distribution between 0.2 and 4. To speed up the simulations and limit memory overhead we used a mutation rate of 0 in SLiM and later applied mutations to the tree sequence with msprime’s mutate function (Kelleher *et al*. 2016). Because msprime applies mutations proportionally to elapsed time, we divided the mutation rate of 10^{−}^{8} mutations per site per generation by the average generation time estimated for each value of *σ* (see ‘Demographic Parameters’ below) to convert the rate to units of mutations per site per unit time. (We verified that this procedure produced the correct number of mutations by comparing to a subset of simulations with SLiM-generated mutations, which are applied only at meiosis.) Simulations were run for 1.6 million timesteps (approximately 30*N* generations).

### Demographic Parameters

Our demographic model includes parameters for population density (*K*), mean life span (*L*), and dispersal distance (*σ*). However, nonlinearity of local demographic stochasticity causes actual realized averages of these demographic quantitites to deviate from the specified values in a way that depends on the neighborhood size. Therefore, to properly compare to theoretical expectations, we empirically calculated these demographic quantities in simulations. We recorded the census population size in all simulations. To estimate generation times, we stored ages of the parents of every new individual born across 200 timesteps, after a 100 generation burn-in, and took the mean. To estimate variance in offspring number, we tracked the number of offspring for all individuals for 100 timesteps following a 100-timestep burn-in period, subset the resulting table to include only the last timestep recorded for each individual, and calculated the variance in number of offspring across all individuals in timesteps 50-100. All calculations were performed with information recorded in the tree sequence, using pyslim (https://github.com/tskit-dev/pyslim).

### Sampling

Our model records the genealogy and sequence variation of the complete population, but in real data, genotypes are only observed from a relatively small number of sampled individuals. We modeled three sampling strategies similar to common data collection methods in empirical genetic studies (Figure 1). “Random” sampling selects individuals at random from across the full landscape, “point” sampling selects individuals proportional to their distance from four equally spaced points on the landscape, and “midpoint” sampling selects individuals in proportion to their distance from the middle of the landscape. Downstream analyses were repeated across all sampling strategies.

### Summary Statistics

We calculated the site frequency spectrum and a set of 18 summary statistics (Table S1) from 60 diploid individuals sampled from the final generation of each simulation using the python package scikit-allel (Miles and Harding 2017). Statistics included common single-population summaries including mean pairwise divergence (*π*), inbreeding coefficient (*F _{IS}*), and Tajima’s

*D*, as well as an isolation-by-distance regression of genetic distance (

*D*) against the logarithm of geographic distance analogous to Rousset (1997)’s approach, which we summarized as the correlation coefficient between the logarithm of the spatial distance and the proportion of identical base pairs across pairs of individuals.

_{xy}Following recent studies that showed strong signals for dispersal and demography in the distribution of shared haplotype block lengths (Ringbauer *et al*. 2017; Baharian *et al*. 2016), we also calculated various summaries of the distribution of pairwise identical-by-state (IBS) block lengths among sampled chromosomes. The full distribution of lengths of IBS tracts for each pair of chromosomes was first calculated with a custom python function. We then calculated the first three moments of this distribution (mean, variance, and skew) and the number of blocks over 10^{6} base pairs both for each pair of individuals and for the full distribution across all pairwise comparisons.

We then estimated correlation coefficients between spatial distance and each moment of the pairwise IBS tract distribution. Because more closely related individuals on average share longer haplotype blocks we expect that spatial distance will be negatively correlated with mean haplotype block length, and that this correlation will be strongest (i.e., most negative) when dispersal is low. The variance, skew, and count of long haplotype block statistics are meant to reflect the relative length of the right (upper) tail of the distribution, which represents the frequency of long haplotype blocks, and so should reflect recent demographic events (Chapman and Thompson 2002). For a subset of simulations, we also calculated cumulative distributions for IBS tract lengths across pairs of distant (> 48 map units) and nearby ( < 2 map units) individuals. Last, we examined the relationship between allele frequency and the spatial dispersion of an allele by calculating the average distance among individuals carrying each derived allele in a set of simulations representing a range of neighborhood sizes.

The effects of sampling on summary statistic estimates were summarized by testing for differences in mean (ANOVA, (R Core Team 2018)) and variance (Levene’s test, (Fox and Weisberg 2011)) across sampling strategies for each summary statistic.

### Demographic Modeling

To assess the impacts of continuous spatial structure on demographic inference we inferred population size histories for all simulations using two approaches: stairwayplot (Liu and Fu 2015) and SMC++ (Terhorst *et al*. 2016). Stairwayplot fits its model to a genome-wide estimate of the SFS, while SMC++ also incorporates linkage information. For both methods we sampled 20 individuals from all spatial simulations using random, midpoint, and point sampling strategies.

As recommended by its documentation, we used stairwayplot to fit models with multiple bootstrap replicates drawn from empirical genomic data, and took the median inferred *N _{e}* per unit time as the best estimate. We calculated site frequency spectra with scikit-allel (Miles and Harding 2017), generated 100 bootstrap replicates per simulation by resampling over sites, and fit models for all bootstrap samples using default settings.

For SMC++, we first output genotypes as VCF with msprime and then used SMC++’s standard pipeline for preparing input files assuming no polarization error in the SFS. We used the first individual in the VCF as the “designated individual” when fitting models, and allowed the program to estimate the recombination rate during optimization. We fit models using the ‘estimate’ command rather than the now recommended cross-validation approach because our simulations had only a single contig.

To evaluate the performance of these methods we binned simulations by neighborhood size, took a rolling median of inferred *N _{e}* trajectories across all model fits in a bin for each method and sampling strategy. We also examined how varying levels of isolation by distance impacted the variance of

*N*estimates by calculating the standard deviation of

_{e}*N*from each best-fit model and plotting these against neighborhood size.

_{e}### Association Studies

To assess the degree to which spatial structure confounds GWAS we simulated four types of nongenetic phenotype variation for 1000 randomly sampled individuals in each spatial SLiM simulation and conducted a linear regression GWAS with principal components as covariates in PLINK (Purcell *et al*. 2007). SNPs with a minor allele frequency less than 0.5% were excluded from this analysis. Phenotype values were set to vary by two standard deviations across the landscape in a rough approximation of the variation seen in height across Europe (Turchin *et al*. 2012; Garcia and Quintana-Domeque 2006, 2007). Conceptually our approach is similar to that taken by Mathieson and McVean (2012), though here we model fully continuous spatial variation and compare GWAS output across a range of dispersal distances.

In all simulations, the phenotype of each individual is determined by adding independent Gaussian noise with mean zero and standard deviation 10 to a mean that may depend on spatial position. We adjust the geographic pattern of mean phenotype to create spatially autocorrelated environmental influences on phenotype. In the first simulation of *nonspatial* environments, the mean did not change, so that all individuals’ phenotypes were drawn independently from a Gaussian distribution with mean 110 and standard deviation 10. Next, to simulate *clinal* environmental influences on phenotype, we increased the mean phenotype from 100 on the left edge of the range to 120 on the right edge (two phenotypic standard deviations). Concretely, an individual at position (*x*, *y*) in a 50 × 50 landscape has mean phenotype 100 + 2*x*/5. Third, we simulated a more concentrated “*corner*” environmental effect by setting the mean phenotype for individuals with both *x* and *y* coordinates below 20 to 120 (two standard deviations above the rest of the map). Finally, in “*patchy*” simulations we selected 10 random points on the map and set the mean phenotype of all individuals within three map units of each of these points to 120.

We performed principal components analysis (PCA) using scikit-allel (Miles and Harding 2017) on the matrix of derived allele counts by individual for each simulation. SNPs were first filtered to remove strongly linked sites by calculating LD between all pairs of SNPs in a 200-SNP moving window and dropping one of each pair of sites with an *R*^{2} over 0.1. The LD-pruned allele count matrix was then centered and all sites scaled to unit variance when conducting the PCA, following recommendations in Patterson *et al*. (2006).

We ran linear-model GWAS both with and without the first 10 principal components as covariates in PLINK and summarized results across simulations by counting the number of SNPs with *p*-value below 0.05 after adjusting for an expected false positive rate of less than 5% (Benjamini and Yekutieli 2001). We also examined *p* values for systemic inflation by estimating the expected values from a uniform distribution (because no SNPs were used when generating phenotypes), plotting observed against expected values for all simulations, and summarizing across simulations by finding the mean *σ* value in each region of quantile-quantile space. Results from all analyses were summarized and plotted with the “ggplot2” (Wickham 2016) and “cowplot” (Wilke 2019) packages in R (R Core Team 2018).

## Results

### Demographic Parameters

Adjusting the spatial dispersal and interaction distance, *σ*, has a surprisingly large effect on demographic quantities that are usually fixed in Wright-Fisher models – the generation time, census population size, and variance in offspring number. These are shown in Figure 2. This occurs because, even through the “population density” (*K*) and “mean lifetime” (*L*) parameters were the same in all simulations, the strength of stochastic effects depends strongly on *σ*. For instance, the population density near to individual *i* (denoted *n _{i}* above) is computed by averaging over roughly

*N*= 4

_{W}*πKσ*

^{2}individuals, and so has standard deviation proportional to – it is more variable at lower densities. (Recall that

*N*is Wright’s neighborhood size.) Since the probability of survival is a nonlinear function of

_{W}*n*, actual equilibrium densities and lifetimes differ from

_{i}*K*and

*L*. This is the reason that we included

*random mating*simulations – where mate choice and offspring dispersal are both nonspatial – since this should preserve the random fluctuations in local population density while destroying any spatial genetic structure. We verified that random mating models retained no geographic signal by showing that summary statistics did not differ significantly between sampling regimes (Table S2), unlike in spatial models (discussed below).

There are a few additional things to note about Figure 2. First, all three quantities are non-monotone with neighborhood size. Census size largely declines as neighborhood size increases for both the spatial and random mating models. However, for spatial models this decline only begins for neighborhood size ≥ 10. By a neighborhood sizes larger than 100, the spatial and random mating models are indistinguishable from one another, a sign that our simulations are performing as expected. Census sizes range from ≈ 14, 000 at low *σ* in the random mating model to ≈ 10, 000 for both models when neighborhood sizes approach 1,000.

Generation time similarly shows complex behavior with respect to neighborhood sizes, and varies between 5.2 and 4.9 timesteps per generation across the parameter range explored. Under both the spatial and random mating models, generation time reaches a minimum at a neighborhood size of around 50. Interestingly, under the range of neighborhood sizes that we examined, generation times between the random mating and spatial models are never quite equivalent – presumably this would cease to be the case at neighborhood sizes higher than we simulated here.

Last, we looked at the variance in number of offspring – a key parameter determining the effective population size. Surprisingly, the spatial and random mating models behave quite differently: while the variance in offspring number increases nearly monotonically under the spatial model, the random mating model actually shows a decline in the variance in offspring number until a neighborhood size ≈ 10 before it increases and eventually equals what we observe in the spatial case.

### Impacts of Continuous Space on Population Genetic Summary Statistics

Even though certain aspects of population demography depend on the scale of spatial interactions, it still could be that population genetic variation is well-described by a well-mixed population model. Indeed, mathematical results suggest that genetic variation in some spatial models should be well-approximated by a Wright-Fisher population if neighborhood size is large and all samples are geographically widely separated (Wilkins 2004; Zähle *et al*. 2005). However, the behavior of most common population genetic summary statistics other than Tajima’s *D* (Städler *et al*. 2009) has not yet been described in realistic geographic models. Moreover, as we will show, spatial sampling strategies can affect summaries of variation at least as strongly as the underlying population dynamics.

### Site Frequency Spectra and Summaries of Diversity

Figure 3 shows the effect of varying neighborhood size and sampling strategy on the site frequency spectrum (Figure 3A) and several standard population genetic summary statistics (Figure 3B). Consistent with findings in island and stepping stone simulations (Städler *et al*. 2009), the SFS shows a significant enrichment of intermediate frequency variants in comparison to the nonspatial expectation. This bias is most pronounced below neighborhood sizes ≤ 100 and is exacerbated by midpoint and point sampling of individuals (depicted in Figure 1). Reflecting this, Tajima’s *D* is quite positive in the same situations (Figure 3B). Notably, the point at which Tajima’s *D* approaches 0 differs strongly across sampling strategies – varying from a neighborhood size of roughly 50 for random sampling to at least 1000 for midpoint sampling.

One of the most commonly used summaries of variation is Tajima’s summary of nucleotide divergence, *θ _{π}*, calculated as the mean density of nucleotide differences averaged across pairs of samples. As can be seen in Figure 3B,

*θ*in the spatial model is inflated by up to three-fold relative to the random mating model. This pattern is opposite the expectation from census population size (Figure 2), because the spatial model has

_{π}*lower*census size than the random mating model at neighborhood sizes less than 100. Differences between these models likely occur because

*θ*is a measure of mean time to most recent common ancestor between two samples, and at small values of

_{π}*σ*, the time for dispersal to mix ancestry across the range exceeds the mean coalescent time under random mating. (For instance, at the smallest value of

*σ*= 0.2, the range is 250 dispersal distances wide, and since the location of a diffusively moving lineage after

*k*generations has variance

*kσ*

^{2}, it takes around 250

^{2}= 62500 generations to mix across the range, which is roughly ten times larger than the random mating effective population size).

*θ*using each sampling strategy approaches the random mating expectation at its own rate, but by a neighborhood size of around 100 all models are roughly equivalent. Interestingly, the effect of sampling strategy is reversed relative to that observed in Tajima’s D – midpoint sampling reaches random mating expectations around neighborhood size 50, while random sampling is inflated until around neighborhood size 100.

_{π}Values of observed heterozygosity and its derivative *F _{IS}* also depend heavily on neigh-borhood size under spatial models as well as the sampling scheme.

*F*is inflated above the expectation across most of the parameter space examined and across all sampling strategies. This effect is caused by a deficit of heterozygous individuals in low-dispersal simulations – a continuous-space version of the Wahlund effect (Wahlund 1928). Indeed, for random sampling under the spatial model,

_{IS}*F*does not approach the random mating equivalent until neighborhood sizes of nearly 1000. On the other hand, the dependency of raw observed heterozygosity on neighborhood size is not monotone. Under midpoint sampling observed heterozygosity is inflated even over the random mating expectation, as a result of the a higher proportion of heterozygotes occurring in the middle of the landscape (Figure S3). This echoes a report from Shirk and Cushman (2014) who observed a similar excess of heterozygosity in the middle of the landscape when simulating under a lattice model.

_{IS}### IBS tracts and correlations with geographic distance

We next turn our attention to the effect of geographic distance on haplotype block length sharing, summarized for sets of nearby and distant individuals in Figure 4. There are two main patterns to note. First, nearby individuals share more long IBS tracts than distant individuals (as expected because they are on average more closely related). Second, the difference in the number of long IBS tracts between nearby and distant individuals decreases as neighborhood size increases. This reflects the faster spatial mixing of populations with higher dispersal, which breaks down the correlation between the IBS tract length distribution and geographic distance. This can also be seen in the bottom row of Figure 3B, where the correlation coefficients between the summaries of the IBS tract length distribution (the mean, skew, and count of tracts over 10^{6}bp) and geographic distance approaches 0 as neighborhood size increases.

The patterns observed for correlations of IBS tract lengths with geographic distance are similar to those observed in the more familiar regression of allele frequency measures such as *D _{xy}* (i.e., “genetic distance”) or

*F*against geographic distance (Rousset 1997).

_{ST}*D*is positively correlated with the geographic distance between the individuals, and the strength of this correlation declines as dispersal increases (Figure 3B), as expected (Wright 1943; Rousset 1997). This relationship is very similar across random and point sampling strategies, but is weaker for midpoint sampling, perhaps due to a dearth of long-distance comparisons. In much of empirical population genetics a regression of genetic differentiation against spatial distance is a de-facto metric of the significance of isolation by distance. The similar behavior of moments of the pairwise distribution of IBS tract lengths shows why haplotype block sharing has recently emerged as a promising source of information on spatial demography through methods described in Ringbauer

_{xy}*et al*. (2017) and Baharian

*et al*. (2016).

### Spatial distribution of allele copies

Mutations occur in individuals and spread geographically over time. Because low frequency alleles generally represent recent mutations (Sawyer 1977; Griffiths *et al*. 1999), the geographic dispersion of an allele may covary along with its frequency in the population. To visualize this relationship we calculated the average distance among individuals carrying a focal derived allele across simulations with varying neighborhood sizes, shown in Figure 5. On average we find that low frequency alleles are the most geographically restricted, and that the extent to which geography and allele frequency are related depends on the amount of dispersal in the population. For populations with large neighborhood sizes we found that even very low frequency alleles can be found across the full landscape, whereas in populations with low neighborhood sizes the relationship between distance among allele copies and their frequency is quite strong. This is the basic process underlying Novembre and Slatkin’s (2009) method for estimating dispersal distances based on the distribution of low frequency alleles, and also generates the greater degree of bias in GWAS effect sizes for low frequency alleles identified in Mathieson and McVean (2012).

### Effects of Space on Demographic Inference

One of the most important uses for population genetic data is inferring demographic history of populations. As demonstrated above, the site frequency spectrum and the distribution of IBS tracts varies across neighborhood sizes and sampling strategies. Does this variation lead to different inferences of past population sizes? To ask this we inferred population size histories from samples drawn from our simulated populations with two approaches: stairwayplot (Liu and Fu 2015), which uses a genome-wide estimate of the SFS, and SMC++ (Terhorst *et al*. 2016), which incorporates information on both the SFS and linkage disequilibrium across the genome.

Figure 6A shows the median inferred population size histories from each method across all simulations, grouped by neighborhood size and sampling strategy. In general these methods tend to slightly overestimate ancient population sizes and infer recent population declines when neighborhood sizes are below 20 and sampling is spatially clustered (Figure 6A, Figure S4). The overestimation of ancient population sizes however is relatively minor, averaging around a two-fold inflation at 10,000 generations before present in the worst-affected bins. For stairwayplot we found that many runs infer dramatic population bottlenecks in the last 1,000 generations when sampling is spatially concentrated, resulting in ten-fold or greater underestimates of recent population sizes. However SMC++ appeared more robust to this error, with runs on point- and midpoint-sampled simulations at the lowest neighborhood sizes underestimating recent population sizes by roughly half and those on randomly sampled simulations showing little error. Above neighborhood sizes of around 100, both methods performed relatively well when averaging across results from multiple simulations.

However, individual model fits from both methods frequently reflected turbulent demographic histories (Figure S4), with the standard deviation of inferred *N _{e}* across time points often exceeding the expected

*N*for both methods (Figure 6B). That is, despite the constant population sizes in our simulations, both methods tended to infer large fluctuations in population size over time, which could potentially result in incorrect biological interpretations. On average the variance of inferred population sizes was elevated at the lowest neighborhood sizes and declines as dispersal increases, with the strongest effects seen in stairwayplot model fits with for clustered sampling and neighborhood sizes less than 20 (Figure 6B).

_{e}### GWAS

To ask what confounding effects spatial genetic variation might have on genome-wide association studies we performed GWAS on our simulations using phenotypes that were determined solely by the environment – so, any SNP showing statistically significant correlation with phenotype is a false positive. As expected, spatial autocorrelation in the environment causes spurious associations across much of the genome if no correction for genetic relatedness among samples is performed (Figures 7 and S5). This effect is particularly strong for clinal and corner environments, for which the lowest dispersal levels cause over 60% of SNPs in the sample to return significant associations. Patchy environmental distributions, which are less strongly spatially correlated (Figure 7A), cause fewer false positives overall but still produce spurious associations at roughly 10% of sites at the lowest neighborhood sizes. Interestingly we also observed a small number of false positives in roughly 3% of analyses on simulations with nonspatial environments, both with and without PC covariates included in the regression.

The confounding effects of geographic structure are well known, and it is common practice to control for this by including principal components (PCs) as covariates to control for these effects. This mostly works in our simulations – after incorporating the first ten PC axes as covariates, the vast majority of SNPs no longer surpass a significance threshold chosen to have a 5% false discovery rate (FDR). However, a substantial number of SNPs – up to 1.5% at the lowest dispersal distances – still surpass this threshold (and thus would be false positives in a GWAS), especially under “corner” and “patchy” environmental distributions (Figure 7C). At neighborhood sizes larger than 500, up to 0.31% of SNPs were significant for corner and clinal environments. Given an average of 132,000 SNPs across simulations after MAF filtering, this translates to up to 382 false-positive associations; for human-sized genomes, this number would be much larger. In most cases the *p* values for these associations were significant after FDR correction but would not pass the threshold for significance under the more conservative Bonferroni correction (see example Manhattan plots in figure S5).

Clinal environments cause an interesting pattern in false positives after PC correction: at low neighborhood sizes the correction removes nearly all significant associations, but at neighborhood sizes above roughly 250 the proportion of significant SNPs increases to up to 0.4% (Figure 7). This may be due to a loss of descriptive power of the PCs – as neighborhood size increases, the total proportion of variance explained by the first 10 PC axes declines from roughly 10% to 4% (Figure 7B). Essentially, PCA seems unable to effectively summarize the weak population structure present in large-neighborhood simulations, but these populations continue to have enough spatial structure to create significant correlations between genotypes and the environment. A similar process can also be seen in the corner phenotype distribution, in which the count of significant SNPs initially declines as neighborhood size increases and then increases at approximately the point at which the proportion of variance explained by PCA approaches its minimum.

Figure 7D shows quantile-quantile plots that show the degree of genome-wide inflation of test statistics in PC-corrected GWAS across all simulations and environmental distributions. For clinal environments, − log_{10}(*p*) values are most inflated when neighborhood sizes are large, consistent with the pattern observed in the count of significant associations after PC regression. In contrast corner and patchy environments cause the greatest inflation in − log_{10}(*p*) at neighborhood sizes less than 100, which likely reflects the inability of PCA to account for fine-scale structure caused by very limited dispersal. Finally, we observed that PC regression appears to overfit to some degree for all phenotype distributions, visible in Figure 7D as points falling below the 1:1 line.

## Discussion

In this study, we have used efficient forward time population genetic simulations to describe the myriad influence of continuous geography on genetic variation. In particular, we examine how three main types of downstream empirical inference are affected by unmodeled spatial population structure – 1) population genetic summary statistics, 2) inference of population size history, and 3) genome-wide association studies (GWAS). As discussed above, space often matters (and sometimes dramatically), both because of how samples are arranged in space, and because of the inherent patterns of relatedness established by geography.

### Effects of Dispersal

Limited dispersal inflates effective population size, creates correlations between genetic and spatial distances, and introduces strong distortions in the site frequency spectrum that are reflected in a positive Tajima’s *D* (Figure 3). At the lowest dispersal distances, this can increase genetic diversity threefold relative to random-mating expectations. These effects are strongest when neighborhood sizes are below 100, but in combination with the effects of nonrandom sampling they can persist up to neighborhood sizes of at least 1000 (e.g., inflation in Tajima’s *D* and observed heterozygosity under midpoint sampling). If samples are chosen uniformly from across space, the general pattern is similar to expectations of the original analytic model of Wright (1943), which predicts that populations with neighborhood sizes under 100 will differ substantially from random mating, while those above 10,000 will be nearly indistinguishable from panmixia.

The patterns observed in sequence data reflect the effects of space on the underlying genealogy. Nearby individuals coalesce rapidly under limited dispersal and so are connected by short branch lengths, while distant individuals take much longer to coalesce than they would under random mating. Mutation and recombination events in our simulation both occur at a constant rate along branches of the genealogy, so the genetic distance and number of recombination events separating sampled individuals simply gives a noisy picture of the genealogies connecting them. Tip branches (i.e., branches subtending only one individual) are then relatively short, and branches in the middle of the genealogy connecting local groups of individuals relatively long, leading to the biases in the site frequency spectrum shown in Figure 3.

The genealogical patterns introduced by limited dispersal are particularly apparent in the distribution of haplotype block lengths (Figure 3). This is because identical-by-state tract lengths reflect the impacts of two processes acting along the branches of the underlying genealogy – both mutation and recombination – rather than just mutation as is the case when looking at the site frequency spectrum or related summaries. This means that the pairwise distribution of haplotype block lengths carries with it important information about genealogical variation in the population, and correlation coefficients between moments of the this distribution and geographic location contain signal similar to the correlations between *F _{ST}* or

*D*and geographic distance (Rousset 1997). Indeed this basic logic underlies two recent studies explicitly estimating dispersal from the distribution of shared haplotype block lengths (Ringbauer

_{xy}*et al*. 2017; Baharian

*et al*. 2016). Conversely, because haplotype-based measures of demography are particularly sensitive to variation in the underlying genealogy, inference approaches that assume random mating when analyzing the distribution of shared haplotype block lengths are likely to be strongly affected by spatial processes.

### Effects of Sampling

One of the most important differences between random mating and spatial models is the effect of sampling: in a randomly mating population the spatial distribution of sampling effort has no effect on estimates of genetic variation (Table S1), but when dispersal is limited sampling strategy can compound spatial patterns in the underlying genealogy and create pervasive impacts on all downstream genetic analyses (see also Städler *et al*. (2009)). In most species, the difficulty of traveling through all parts of a species range and the inefficiency of collecting single individuals at each sampling site means that most studies follow something closest to the “point” sampling strategy we simulated, in which multiple individuals are sampled from nearby points on the landscape. For example, in ornithology a sample of 10 individuals per species per locality is a common target when collecting for natural history museums. In classical studies of *Drosophila* variation the situation is considerably worse, in which a single orchard might be extensively sampled.

When sampling is clustered at points on a landscape and dispersal is limited, the sampled individuals will be more closely related than a random set of individuals. Average coalescence times of individuals collected at a locality will then be more recent and branch lengths shorter than expected by analyses assuming random mating. This leads to fewer mutations and recombination events occurring since their last common ancestor, causing a random set of individuals to share longer average IBS tracts and have fewer nucleotide differences. For some data summaries, such as Tajima’s *D*, Watterson’s Θ, or the correlation coefficient between spatial distance and the count of long haplotype blocks, this can result in large differences in estimates between random and point sampling (Figure 3). Inferring underlying demographic parameters from these summary statistics – unless the nature of the sampling is somehow taken into account – will be subject to bias if sampling is not random across the landscape.

However, we observed the largest sampling effects using “midpoint” sampling. This model is meant to reflect a bias in sampling effort towards the middle of a species’ range. In empirical studies this sampling strategy could arise if, for example, researchers choose to sample the center of the range and avoid range edges to maximize probability of locating individuals during a short field season. Because midpoint sampling provides limited spatial resolution it dramatically reduces the magnitude of observed correlations between spatial and genetic distances. More surprisingly, midpoint sampling also leads to strongly positive Tajima’s *D* and an inflation in the proportion of heterozygous individuals in the sample – similar to the effect of sampling a single deme in an island model as reported in (Städler *et al*. 2009). This increase in observed heterozygosity appears to reflect the effects of range edges, which are a fundamental facet of spatial genetic variation. If individuals move randomly in a finite two-dimensional landscape then regions in the middle of the landscape receive migrants from all directions while those on the edge receive no migrants from at least one direction. The average number of new mutations moving into the middle of the landscape is then higher than the number moving into regions near the range edge, leading to higher heterozygosity and lower inbreeding coefficients (*F _{IS}*) away from range edges. Though here we used only a single parameterization of fitness decline at range edges we believe this is a general property of non-infinite landscapes as it has also been observed in previous studies simulating under lattice models (Neel

*et al*. 2013; Shirk and Cushman 2014).

In summary, we recommend that empirical researchers collect individuals from across as much of the species’ range as practical, choosing samples separated by a range of spatial scales. Many summary statistics are designed for well-mixed populations, and so provide different insights into genetic variation when applied to different subsets of the population. Applied to a cluster of samples, summary statistics based on segregating sites (e.g., Watterson’s Θ and Tajima’s *D*), heterozygosity, or the distribution of long haplotype blocks, can be expected to depart significantly from what would be obtained from a wider distribution of samples. Comparing the results of analyses conducted on all individuals versus those limited to single individuals per locality can provide an informative contrast. Finally we wish to point out that the bias towards intermediate allele frequencies that we observe may mean that the importance of linked selection, at least as is gleaned from the site frequency spectrum, may be systematically underestimated currently.

### Demography

Previous studies have found that population structure and nonrandom sampling can create spurious signals of population bottlenecks when attempting to infer demographic history with microsatellite variation, summary statistics, or runs of homozygosity (Chikhi *et al*. 2010; Städler *et al*. 2009; Ptak and Przeworski 2002; Mazet *et al*. 2015). Here we found that methods that infer detailed population trajectories through time based on the SFS and patterns of LD across the genome are also subject to this bias, with some combinations of dispersal and sampling strategy systematically inferring deep recent population bottlenecks and overestimating ancient *N _{e}* by a around a factor of 2. We were surprised to see that both stairwayplot and SMC++ can tolerate relatively strong isolation by distance – i.e., neighborhood sizes of 20 – and still perform well when averaging results across multiple simulations. Inference in populations with neighborhood sizes over 20 was relatively unbiased unless samples were concentrated in the middle of the range (Figure 6). Although median demography estimates across many independent simulations were fairly accurate, empirical work has only a single estimate to work with, and individual model fits (Figure S4) suggest that spuriously inferred population size changes and bottlenecks are common, especially at small neighborhood sizes. As we will discuss below, most empirical estimates of neighborhood size, including all estimates for human populations, are large enough that population size trajectories inferred by these approaches should not be strongly affected by spatial biases created by dispersal in continuous landscapes. In contrast, Mazet

*et al*. (2015) found that varying migration rates through time could create strong biases in inferred population trajectories from an

*n*-island model with parameters relevant for human history, suggesting that changes in migration rates through time are more likely to drive variation in inferred

*N*than isolation by distance.

_{e}We found that SMC++ was more robust to the effects of space than stairwayplot, under-estimating recent populations by roughly half in the worst time periods rather than nearly 10-fold as with stairwayplot. Though this degree of variation in population size is certainly meaningful in an ecological context, it is relatively minor in population genetic terms. A more worrying pattern was the high level of variance in inferred *N _{e}* trajectories for individual model fits using these methods, which was highest in simulations with the smallest neighborhood size (Figure 6, Figure S4). This suggests that, at a minimum, researchers working with empirical data should replicate analyses multiple times and take a rolling average if model fits are inconsistent across runs. Splitting samples and running replicates on separate subsets – the closest an empirical study can come to our design of averaging the results from multiple simulations – may also alleviate this issue.

Our analysis suggests that many empirical analyses of population size history using methods like SMC++ are robust to error caused by spatial structure within continuous landscapes. Inferences drawn from static SFS-based methods like stairwayplot should be treated with caution when there are signs of isolation by distance in the underlying data (for example, if a regression of *F _{ST}* against the logarithm of geographic distance has a significantly positive slope), and in particular an inference of population bottlenecks in the last 1000 years should be discounted if sampling is clustered, but estimates of deeper time patterns are likely to be fairly accurate. The biases in the SFS and haplotype structure identified above (see also Wakeley 1999; Chikhi

*et al*. 2010; Städler

*et al*. 2009) are apparently small enough that they fall within the range of variability regularly inferred by these approaches, at least on datasets of the size we simulated.

### GWAS

Spatial structure is particularly challenging for genome-wide association studies, because the effects of dispersal on genetic variation are compounded by spatial variation in the environment (Mathieson and McVean 2012). Spatially restricted mate choice and dispersal causes variation in allele frequencies across the range of a species. If environmental factors affecting the phenotype of interest also vary over space, then groups of individuals in different regions will allele frequencies and environmental exposures will covary over space. In this scenario an uncorrected GWAS will infer genetic associations with a purely environmental phenotype at any site in the genome that is differentiated over space, and the relative degree of bias will be a function of the degree of covariation in allele frequencies and the environment (i.e., Figure 7C, bottom panel). This pattern has been demonstrated in a variety of simulation and empirical contexts (Price *et al*. 2006; Yu *et al*. 2005; Young *et al*. 2018; Mathieson and McVean 2012; Kang *et al*. 2008, 2010; Bulik-Sullivan *et al*. 2015; Berg *et al*. 2018; Sohail *et al*. 2018).

Incorporating PC positions as covariates in a linear-regression GWAS (Price *et al*. 2006) is designed to address this challenge by regressing out a baseline level of “average” differentiation. In essence, a PC-corrected GWAS asks “what regions of the genome are more associated with this phenotype than the average genome-wide association observed across populations?” In our simulations, we observed that this procedure can fail under a variety of circumstances. If dispersal is limited and environmental variation is clustered in space (i.e., corner or patchy distributions in our simulations), PCA positions fail to capture the fine-scale spatial structure required to remove all signals of association. Conversely, as dispersal increases, PCA loses power to describe population structure before spatial mixing breaks down the relationship between genotype and the environment. These effects were observed with all spatially correlated environmental patterns, but were particularly pronounced if environmental effects are concentrated in one region, as was also found by Mathieson and McVean (2012). Though increasing the number of PC axes used in the analysis may reduce the false-positive rate, this may also decrease the power of the test to detect truly causal alleles (Lawson *et al*. 2019).

In this work we simulated a single chromosome with size roughly comparable to one human chromosome. If we scale the number of false-positive associations identified in our analyses to a GWAS conducted on whole-genome data from humans, we would expect to see several thousand weak false-positive associations after PC corrections in a population with neighborhood sizes up to at least 1000 (which should include values appropriate for many human populations). Notably, very few of the spurious associations we identified would be significant at a conservative Bonferroni-adjusted *p*-value cutoff (see Figure S5). This suggests that GWAS focused on finding strongly associated alleles for traits controlled by a limited number of variants in the genome are likely robust to the impacts of continuous spatial structure. However, methods that analyze the combined effects of thousands or millions of weakly associated variants such as polygenic risk scores (Khera *et al*. 2018) are likely to be affected by subtle population structure. Indeed as recently identified in studies of genotype associations for human height in Europe (Berg *et al*. 2018; Sohail *et al*. 2018), PC regression GWAS in modern human populations do include residual signal of population structure in large-scale analyses of polygenic traits. When attempting to make predictions across populations with different environmental exposures, polygenic risk scores affected by population structure can be expected to offer low predictive power, as was shown in a recent study finding lower performance outside European populations (Martin *et al*. 2019).

In summary, spatial covariation in population structure and the environment confounds the interpretation of GWAS *p*-values, and correction using principal components is insufficient to fully separate these signals for polygenic traits under a variety of environmental and population parameter regimes. Other GWAS methods may be less sensitive to this confounding, but there is no obvious reason that this should be so. One approach to estimating the degree of bias in GWAS caused by population structure is LD score regression (Bulik-Sullivan *et al*. 2015). Though this approach appears to work well in practice, its interpretation is not always straightforward and it is likely biased by the presence of linked selection (Berg *et al*. 2018). In addition, we observed that in many cases the false-positive SNPs we identified appeared to be concentrated in LD peaks similar to those expected from truly causal sites (Figure S5), which may confound LD score regression.

We suggest a straightforward alternative for species in which the primary axes of population differentiation is space (note this is likely not the case for some modern human populations): run a GWAS with spatial coordinates as phenotypes and check for *p*-value inflation or significant associations. If significant associations with sample locality are observed after correcting for population structure, the method is sensitive to false positives induced by spatial structure. This is essentially the approach taken in our “clinal” model (though we add normally distributed noise to our phenotypes). Of course, it is possible that genotypes indirectly affect individual locations by adjusting organismal fitness and thus habitat selection across spatially varying environments, but we believe that this hypothesis should be tested against a null of stratification bias inflation rather than accepted as true based on GWAS results.

### Where are natural populations on this spectrum?

For how much of the tree of life do spatial patterns circumscribe genomic variation? In Table 1 we gathered estimates of neighborhood size from a range of organisms to get an idea of how likely dispersal is to play an important role in patterns of variation. Though this sample is almost certainly biased towards small-neighborhood species (because few studies have quantified neighborhood size in species with very high dispersal or population density), we find that neighborhood sizes in the range we simulated are fairly common across a range of taxa. At the extreme low end of empirical neighborhood size estimates we see some flowering plants, large mammals, and colonial insects like ants. Species such as this have neighborhood size estimates small enough that spatial processes are likely to strongly influence inference. These include some human populations such as the Gainj- and Kalam-speaking people of Papua New Guinea, in which the estimated neighborhood sizes in (Rousset 1997) range from 40 to 410 depending on the method of estimation. Many more species occur in a middle range of neighborhood sizes between 100 and 1000 – a range in which spatial processes play a minor role in our analyses under random spatial sampling but are important when sampling of individuals in space is clustered. Surprisingly, even some flying insects with huge census population sizes fall in this group, including fruit flies (*D. melanogaster*) and mosquitoes (*A. aegypti*). Last, many species likely have neighborhood sizes much larger than we simulated, including modern humans in northeastern Europe (Ringbauer *et al*. 2017). For these species demographic inference and summary statistics are likely to reflect minimal bias from spatial effects as long as dispersal is truly continuous across the landscape. While that is so we caution that association studies in which the effects of population structure are confounded with spatial variation in the environment are still sensitive to dispersal even at these large neighborhood sizes.

### Future Directions and Limitations

As we have shown, a large number of population genetic summary statistics contain information about spatial population processes. We imagine that combinations of such summaries might be sufficient for the construction of supervised machine learning regressors (e.g., Schrider and Kern 2018) for the accurate estimation of dispersal from genetic data. Indeed, Ashander *et al*. (2018) found that inverse interpolation on a vector of summary statistics provided a powerful method of estimating dispersal distances. Expanding this approach to include the haplotype-based summary statistics studied here and applying machine learning regressors built for general inference of nonlinear relationships from high-dimensional data may allow precise estimation of spatial parameters under a range of complex models.

One facet of spatial variation that we did not address in this study is the confounding of dispersal and population density implicit in the definition of Wright’s neighborhood size. Our simulations were run under constant densities, but Ringbauer *et al*. (2017)’s approach to demographic inference in space suggests that density and dispersal can in some cases be estimated separately from genetic data. Much additional work remains to be done to better understand how these parameters interact to shape genetic variation in continuous space, which we leave to future studies.

Though our simulation allows incorporation of realistic demographic and spatial processes, it is inevitably limited by the computational burden of tracking tens or hundreds of thousands of individuals in every generation. In particular, computations required for mate selection and spatial competition scale approximately with the product of the total census size and the neighborhood size and so increase rapidly for large populations and dispersal distances. The reverse-time model of continuous space evolution described by Barton *et al*. (2010) and implemented by Kelleher *et al*. (2014) allows exploration of parameter regimes with population and landscape sizes more directly comparable to empirical cases like humans. Alternatively, implementation of parallelized calculations may allow progress with forward-time simulations.

Finally, we believe that the difficulties in correcting for population structure in continuous populations using principal components analysis or similar decompositions is a difficult issue, well worth considering on its own. How can we best avoid spurious correlations while correlating genetic and phenotypic variation without underpowering the methods? Perhaps optimistically, we posit that process-driven descriptions of ancestry and/or more generalized unsupervised methods may be able to better account for carry out this task.

## Data Availability

Scripts used for all analyses and figures are available at https://github.com/petrelharp/spaceness.

## Acknowledgements

We thank Brandon Cooper, Matt Hahn, Doc Edge, and others for reading and thinking about this manuscript. We thank the Hearth for having such good, nearby coffee, and Falling Sky Brewing for creative support during manuscript drafting. CJB and ADK were supported by NIH award R01GM117241.

## Footnotes

↵† these authors co-supervised this project