Estimation of spatial demographic maps from polymorphism data using a neural network

A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and source sink dynamics of dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity by descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology, and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.

• G1. (s, 2) Genotypes for a pair of individuals.This branch of the network will be repeated for multiple pairs.
• G9. (kw 2 , 128) The outputs from k pairs are stacked together, and then duplicated for each of w 2 pixels.
• L1. (kw 2 , 7) Locations table for every combination of pixel and genotype-pair (not all rows are shown).

Figure S1 :
Figure S1: Extended caption for Figure 1.Visualized tensor sizes are proportional to the cube root of actual dimensions if 5,000 SNPs, 10 pairs, and map width 10.Descriptions and output sizes for each layer are described below.

Figure S2 :
Figure S2: Predicted maps for a randomly selected, simulated test dataset.The leftmost column shows the ground truth maps for dispersal (top row) and density (bottom row).Columns 2-4 show estimated maps using three different methods: mapNN, FEEMS, and MAPS (respectively).

Figure S3 :
Figure S3: Predicted maps for a randomly selected, simulated test dataset.The leftmost column shows the ground truth maps for dispersal (top row) and density (bottom row).Columns 2-4 show estimated maps using three different methods: mapNN, FEEMS, and MAPS (respectively).

Figure S4 :
Figure S4: Predicted maps for a randomly selected, simulated test dataset.The leftmost column shows the ground truth maps for dispersal (top row) and density (bottom row).Columns 2-4 show estimated maps using three different methods: mapNN, FEEMS, and MAPS (respectively).

Figure S5 :
Figure S5: Predicted maps for a randomly selected, simulated test dataset.The leftmost column shows the ground truth maps for dispersal (top row) and density (bottom row).Columns 2-4 show estimated maps using three different methods: mapNN, FEEMS, and MAPS (respectively).

Figure S6 :
Figure S6: Predicted maps for a randomly selected, simulated test dataset.The leftmost column shows the ground truth maps for dispersal (top row) and density (bottom row).Columns 2-4 show estimated maps using three different methods: mapNN, FEEMS, and MAPS (respectively).

Figure S7 :
Figure S7: Predicted maps for a randomly selected, simulated test dataset for the North American grey wolf analysis.The left column shows estimated values and the right column shows uncertainty: the heat map conveys the width of each pixel-wise 95% confidence interval from parametric bootstrapping.The first row is for dispersal rate, and the second row for density.

Figure S8 :
Figure S8: Predicted maps for a randomly selected, simulated test dataset for the North American grey wolf analysis.The left column shows estimated values and the right column shows uncertainty: the heat map conveys the width of each pixel-wise 95% confidence interval from parametric bootstrapping.The first row is for dispersal rate, and the second row for density.

Figure S9 :
Figure S9: Predicted maps for a randomly selected, simulated test dataset for the North American grey wolf analysis.The left column shows estimated values and the right column shows uncertainty: the heat map conveys the width of each pixel-wise 95% confidence interval from parametric bootstrapping.The first row is for dispersal rate, and the second row for density.

Figure S10 :
Figure S10: Predicted maps for a randomly selected, simulated test dataset for the North American grey wolf analysis.The left column shows estimated values and the right column shows uncertainty: the heat map conveys the width of each pixel-wise 95% confidence interval from parametric bootstrapping.The first row is for dispersal rate, and the second row for density.

Figure S11 :
Figure S11: Predicted maps for a randomly selected, simulated test dataset for the North American grey wolf analysis.The left column shows estimated values and the right column shows uncertainty: the heat map conveys the width of each pixel-wise 95% confidence interval from parametric bootstrapping.The first row is for dispersal rate, and the second row for density.

Figure S12 :
Figure S12: PNG renderings for a random selection of training maps for the benchmark analysis.The blue channel conveys dispersal rate and the red channel conveys carrying capacity.

Figure S13 :
Figure S13: The minimum dispersal rate (σ) supporting a stable population for different carrying capacity (K) values.

Figure S14 :
Figure S14: PNG renderings for a random selection of training maps for the North American grey wolf analysis.The blue channel conveys dispersal rate and the red channel conveys carrying capacity.