Artificial neural networks for monitoring network optimisation—a practical example using a national insect survey

Monitoring networks are improved by additional sensors. Optimal configurations of sensors give better representations of the process of interest, maximising its exploration while minimising the need for costly infrastructure. By modelling the monitored process, we can identify gaps in its representation, i.e. uncertain predictions, where additional sensors should be located. Here, with data collected from the Rothamsted Insect Survey network, we train an artificial neural network to predict the seasonal aphid arrival from environmental variables. We focus on estimating prediction uncertainty across the UK to guide the addition of a sensor to the network. We first illustrate how to estimate uncertainty in neural networks, hence making them relevant for model-based monitoring network optimisation. Then we highlight critical areas of agricultural importance where additional traps would improve decision support and crop protection in the UK. Possible applications include most ecological monitoring and surveillance activities, but also the weather or pollution monitoring.


Introduction
Monitoring networks capture spatiotemporal variations of one or more variables of interest.
They offer discrete measurements which can be used to predict the variable at unmeasured times and/or locations (with statistical models, as in e.g. Hiemstra et al., 2009), or to investigate 25 the underlying biological or physical processes (with process-based models, as in e.g. Roques and Bonnefon, 2016). A desired property of the network is to capture the most significant spatiotemporal variations, so that their subsequent generalisations are meaningful. In this regard, Monitoring Network Optimisation (MNO) is needed to support many surveillance and monitoring activities that are challenged to produce meaningful data from limited resources. 30 Thus, it is imperative that these networks address the optimal selection of sites for the greatest practical and scientific impact.
The question of where and when to sample a process has been approached with designbased and model-based frameworks (Brus and de Gruijter, 1997). The former aims at inferring population parameters while limiting spatial bias, with sampling strategies of varying degrees 35 of randomness and stratification to ensure coverage. The latter requires the development and training of a model of the data generating process to select sampling locations that will reduce the model uncertainty (Mateu and Müller, 2012a), often simply sampling where the model is the most uncertain about its predictions. MNO is a case of optimal sampling which generally builds on an existing network and hence, existing measurements of the variable of interest 40 (Mateu and Müller, 2012b). It is therefore naturally addressed with a model-based approach, in which historic measurements serve the development of a model which subsequently informs site selection.
Optimal sampling designs often rely on modelling approaches such as kriging (Müller, 2007;Spöck, 2012), which builds on the spatial autocorrelation of the measurements and, optionally, 45 their regression to environmental covariates (Oliver and Webster, 2015). However, autocorrelation can appear inconsistent at the scale of sampling, preventing variograms from being fitted, particularly when a small number of measurements are available. This also affects our ability to disentangle complex relationships between measurements and environmental covariates. Machine learning approaches, such as random forests, are usually less hindered by aforementioned 50 conditions (Holloway et al., 2018;Fouedjio and Klump, 2019). A key limitation, however, is that machine learning methods are often unable to associate their predictions with uncertainty estimates. Nonetheless, when uncertainty estimates are provided, by mapping them from the model's feature space to the geographical space, we can guide the selection of the next sensor location. Using machine learning for such model-based optimal sampling has been referred to 55 as uncertainty sampling, an active learning strategy (Tuia et al., 2012). Its principle is iterative: sample where the model uncertainty is the highest, increment the dataset with the new measurement, and train the model again. In our practical case, by deploying sensors in high uncertainty areas, we ensure a strategic increment to the dataset for the next training iteration.
Monitoring network optimisation has been applied to various environmental concerns such as 60 weather (Heuvelink et al., 2012), air pollution (Helle and Pebesma, 2012) or water pollution (Do et al., 2012). However, the question of where and when to sample is also routinely considered by ecologists who are often confronted with domestic and practical concerns that dominate their decision process. These include, but are not limited to, the permission to place monitoring infrastructure in a particular location, the available staff in close proximity to service the 65 infrastructure and the local expertise to identify species to inform measures of occupancy, phenology and abundance collected from the device. These concerns cloud strategic thinking of where a trap is best placed to inform occupancy, phenology and abundance from a biological perspective. Supporting such decision with both statistical evidence and a range of options then allows for an objective, yet flexible improvement.

70
Our focus is on the Rothamsted Insect Survey (RIS) network of 12.2m suction-traps. These traps are mostly located in agricultural areas and have been continuously monitoring insect migration and movement since 1964 (Bell et al., 2015). For the purpose of this study, the network has two objectives centred around species level prediction of aphid migration events. The first practical objective is the efficient and timely detection of seasonal aphid migration, which 75 signals the need for bulletins and alerts to be sent to growers and other interested bodies. With this information, growers have enough time to inspect their crops and respond appropriately, thereby reducing the prophylactic use of insecticides. The second epistemic objective is to provide the best representation of aphid migration phenologies in order to increment scientific knowledge through hypotheses testing (Bell et al., 2012(Bell et al., , 2015(Bell et al., , 2019. This second objective 80 is directly addressed by uncertainty sampling, i.e. the network improvement is maximised by setting a trap where the model predictions are the most uncertain. The first practical objective however, is about handling a risk: the risk of missing a seasonal infestation, potentially resulting in costly damage on crops. Hence, addressing this objective requires accounting for 3 the distribution of crops locally at risk of infestation.

85
Since aphids are identified to the species level, we develop a model predicting locally when an aphid species starts its seasonal migration (i.e. first flight), thereby signalling when aphids may present a threat to agriculture. This is done for the 14 most common aphid species in the UK, taking the form of a multi-output regression (Borchani et al., 2015). It builds on an artificial neural network (ANN), an architecture enabling the development of complex relationships 90 between inputs (the features, or covariates) and outputs (the labels, or response variables), through a succession of hidden layers of neurons. Because of their unmatched learning abilities, ANNs are proven assets in addressing tasks of ecological complexity (see e.g. Ye et al., 2019). Here, for the purpose of uncertainty sampling, we detail how ANNs can associate their predictions with uncertainty estimates. We explain how the practical and epistemic objectives 95 described above relate to two different sources of uncertainty, both approachable with ANNs through specific augmentations. Finally, building on the estimation of those uncertainties, we provide propositions of improvement for the RIS suction trap network that address both objectives.
2 Theory: uncertainty estimation 100 The promises of uncertainty sampling follow a simple heuristic: by sampling where the model uncertainty is the highest, we maximise the model improvement while limiting the cost of additional sampling (Lewis and Gale, 1994). However, this hides a crucial complexity: a model can be uncertain by lack of consensus in its measurements, or simply by lack of measurement . The first case defines the aleatory uncertainty, the intrinsic variation in 105 the data generating process. It is sometimes called irreducible uncertainty as it is not expected to reduce when generating more samples in a given region of the input space (but could be reduced by expanding the input space with additional features). The second case defines the epistemic uncertainty which can be reduced with more samples and represents our lack of knowledge of the data generating process. Note that aleatory and epistemic uncertainties are 110 sometimes referred to as risk and uncertainty respectively (see e.g. Osband, 2016).
Which uncertainty to sample depends on the objective of the network. What we previously described as the epistemic objective of the network, i.e. incrementing our knowledge of the aphid phenology (with e.g. better population parameter estimates), should focus on the epis-4 temic uncertainty (Nguyen et al., 2019). This means sampling in priority unexplored areas of 115 the input space.
The practical objective of the network, i.e. the timely detection of aphid arrival, is more complex to address. On one hand, better knowledge of the process means a better generalisation and hence, more reliable declarations of infestation farther away from the traps. On the other hand, by setting traps in areas with greater aleatory uncertainty (i.e. where large variations of 120 the response variable are expected), we allow the timely detection in areas where predictions are most unreliable. This second approach provides essentially a local improvement to the network and therefore, the local density of crops to protect should be factored in the decision.
We detail in this section how to estimate those two uncertainties with ANNs (see Hüllermeier and Waegeman, 2020, for a wider perspective on their estimation in machine learning).

Aleatory uncertainty
From similar inputs, a model can output a distribution of responses that informs us on the inherent variability within the measured system, this is known as the aleatory uncertainty. A covariate (or feature) can affect the distribution of the response variable (or label) without affecting its mean value. Such relationships, although important for risk prediction, are likely 130 missed by a regression to the mean. Quantile regressions identify correlations between covariates and specific quantiles of the response variable distribution (Koenker and Bassett, 1978), allowing a more complete view of their relationship. They offer a perspective on extreme events and are therefore particularly relevant to risk management (Herrera et al., 2018). For example, if modelling a pest arrival, predicting a lower quantile of the response variable tells us how 135 early an infestation can be expected locally with a set degree of confidence. The appeal and singularity of this method are that quantiles, and hence prediction intervals, can be derived without concern of error distribution or variance heterogeneity (Cade and Noon, 2003).
By switching from a regression to the mean to a quantile regression, the ANN is trained to output quantiles of the response variable distribution, typically quantiles 5%, 50% and 140 95%, instead of the mean. While the input shape remains unchanged, the output becomes a 1-dimensional array of as many quantiles as desired (a simultaneous quantile regression, Tagasovska and Lopez-Paz, 2019). This is done using the so-called pinball loss function instead of the quadratic loss function (or L2 loss) used in regressions to the mean. The pinball loss, more formally referred to as the asymmetric absolute loss function (Furno and Vistocco, 2018), 145 is given by where y is an observation,ŷ the corresponding prediction and τ the desired quantile to estimate.

150
Quantile regressions are extremely useful for problems with heterogeneous variance across the feature space, such as in our example (and in ecology in general, Cade and Noon, 2003).
They can be incorporated in statistical methods such as GAMs (Fasiolo et al., 2017) but are more often seen in machine learning approaches such as random forests (see Quantile RF, Meinshausen, 2006). In contexts of weak spatial autocorrelation of the labels and/or non-linear 155 relationship to the features, quantile random forests have shown more reliable uncertainty estimation than kriging methods (Fouedjio and Klump, 2019). Note also that simultaneous quantile regressions are conveniently supported by the dimensional flexibility of ANNs, whose outputs can easily be made multidimensional and/or multiple (as in our example), and whose generalisation power is improved by the additional information gathered by the multiple quantiles 160 estimated at once (Rodrigues and Pereira, 2018).

Epistemic uncertainty
Epistemic uncertainty marks the lack of exploration of the feature space: the more similar a location is to the sensor locations, the lower its prediction uncertainty. A diversity of methods exist for epistemic uncertainty estimation in ANNs (Gal and Ghahramani, 2016;Gal et al., 2017;165 Tagasovska and Lopez-Paz, 2019) but ensemble methods seem the most accepted (Osband, 2016). Ensemble methods build on a set of models which are trained on various perturbed versions of the data and whose predictions are subsequently averaged (Lakshminarayanan et al., 2017). Bagging (from bootstrap aggregation, Breiman, 1996) is an ensemble method in which the perturbation results from sampling the training dataset with replacement, i.e. bootstrapping.

170
In this approach, the trained ANNs that compose the ensemble are trained on bootstrapped data sets with random weight initialisations, and therefore they output varying predictions.
Essentially, bootstrapping allows the approximation of the statistical population by simply resampling the sample data. Therefore, where the predictions of the ensemble disagree the 6 most, the most sensitive to resampling is the approximation, marking the scarcity of data 175 points in the local area of the feature space.
In practice, the output of the bagging ensemble of trained models is an ensemble of predictions, from which both central tendency and dispersion measures are derived. The former constitute more robust predictions than the single model predictions, while the later gives estimates of the epistemic uncertainty. Note that for a quantile regression as in our case, we derive Improving the monitoring network requires us to consider the diversity of its measurements and to select those that are key to its optimisation. Therefore, we consider the aphid arrival of the 14 most common species of aphids, trapped between 1965 and 2018 across the UK.

190
These species represent two thirds of the species reported by the RIS on its weekly bulletin (https://insectsurvey.com/).
Aphid arrival happens seasonally and its time of occurrence can be quantified by the day of first flight (DFF). The DFF is a critical phenological event of prime importance for pest control and ecology. It represents the Julian day at which a given species is first detected 195 in a suction-trap catch. However, this variable is subject to large variations caused by the occasional extreme outlier, so predicting instead the day at which a percentile of the yearly catches is reached can be more insightful (as in e.g. Thackeray et al., 2010Thackeray et al., , 2016Sheppard et al., 2016;Pak et al., 2019). We focused on the day of 1% detection (D1D) (i.e. the Julian day that the first percentile of the year abundance was detected), as a proxy for aphid arrival 200 as it is practically equivalent to the DFF and shows a lower variance and a more symmetric distribution, making it easier to predict (see the measured D1D in Figure 1-B).

Features: environmental covariates
Aphid phenology, and more precisely aphid arrival, has been shown to depend on climate and land-cover variables. Temperature appears to be the main driver (Zhou et al., 1995) and the 205 current trend for warmer winters translates very distinctly into earlier captures (Harrington et al., 2007;Bell et al., 2019). Figure 1-B also illustrates this trend in time, as well as across latitude. Precipitation effects are less direct but also expected through the drought-stress of plant hosts (Mcvean and Dixon, 2001). Finally, land-use also appears a useful predictor, although through non-linear and interacting effects (Cocu et al., 2005b). Therefore, as established 210 drivers of aphid arrival, temperature, precipitation and land-use constitute the key features of our regression.
Climate observations were obtained from the HadUK grid (Hollis et al., 2019). It offers, among many other measurements, daily temperatures and rainfall at varying spatial resolutions (we use 5km×5km). We simplify the dataset by averaging the daily temperature and rainfall 215 into monthly features, from September of year n − 1 to August of year n. By doing so we aim to reduce the input noise, as we expect the impact of the weather on aphid phenology to develop over longer time-scales than daily. We also include the number of frost days per winter months.
Land-use data comes from the CORINE land-cover (CLC 2000(CLC , 2006(CLC , 2012  We also added static topographical features to the set of covariates. First, distance to 225 traps, which allows for proximity effects to arise in a more flexible way than by fitting a variogram (Hengl et al., 2018), allowing predictions to build on the spatial autocorrelation of the labels (Cocu et al., 2005a). Finally, elevation and distance to sea were included to capture additional details of the local climates (specifically in regard of wind). The result is a total of 86 features collected everywhere in the UK on a 5km×5km resolution, every year since 1965.

230
Their distributions are figured in Appendix B.

Multi-output quantile regression
Our multi-output quantile regression predicts quantiles of the D1D of the 14 species of interest.
It builds on a specific ANN architecture: 3 hidden layers common to the 14 species, followed by one species-specific hidden layer of a smaller size. Complex interactions of covariates can Uncertainty estimation is not a common feature in ANN libraries. As mentionned previ-240 ously, estimating aleatory uncertainty requires the implementation of the specific pinball loss function, while, as presented here, estimating epistemic uncertainty requires training an ensemble of networks on bootstrapped data sets (i.e. bagging), which is computationally demanding.
In addition to those technical developments, and following Osband et al. (2018), we add Randomised Prior Functions to the network. This mechanism builds on an untrainable parallel 245 network whose output is added to the trained network output, thereby acting as a Bayesian prior (see Figure 6 in Appendix A for a schematic view of the network). Then, the bagging predictions provide an approximate posterior distribution from which to estimate epistemic uncertainty . This (technically optional) prior mechanism corrects a ten-dency for overconfidence in the bootstrap ensemble models in situation of weak measurement 250 heterogeneity (i.e. low aleatory uncertainty). Hence, they help disentangling the confusing effects of data scarcity and data heterogeneity, therefore resulting into epistemic uncertainty estimates that remain unaffected by measurements heterogeneity (from which arises aleatory uncertainty, see appendix C.5 of Osband et al., 2018, for an illustration).
In order to reduce the input noise we run a simple feature selection consisting of dropping 255 correlated features with a threshold of 95% correlation. Of the 86 features described above, 63 remained for training. Thresholds of 90% or 100% (no dropping) were also tested but were found to reduce performances. This approach does not guarantee that the removed features are not determinant predictors, it only ensures that the signal they carry is not withdrawn with them as the data set is reduced.

260
The ANN was developed with Keras, Tensorflow and Scikit-learn in Python 3.6. Details on the ANN architecture, activation, regularisation, model selection, training and residuals are given in Appendix A.

Monitoring network optimisation
To the best of our knowledge, uncertainty sampling is always addressed from an epistemic point 265 of view, aiming at the exploration of the feature space. If, like ordinary kriging, a model builds on the labels' spatial autocorrelation only, the feature space and the geographical space are confounded and uncertainty arises with the geographical distance from measurement locations (see e.g. Chen et al., 2016). In such cases, uncertainty sampling results in space-filling designs.
However, as environmental covariates are added to the model, unexplored areas of the feature 270 space, which hold the highest uncertainty, do not necessarily correspond to unexplored areas of the geographical space (Brus and Heuvelink, 2007). Here however, as most of our environmental covariates exhibit strong spatial autocorrelation, a space-filling design could be expected despite their large number. Addressing our epistemic objective of improving the predictions of aphid arrival, we seek to identify the locations where the function is greatest; where x marks the spatial coordinates (x, y), and U E x,i is the epistemic uncertainty of species i at location x. As the DFF itself, it is measured in days. The measure of U E used here On the other hand, areas of high aleatory uncertainty do not correspond to unexplored regions of the feature space (Nguyen et al., 2019). The predictions there are relatively informed by contemporary and past measurements of the network, but still they remain largely undetermined. Those are areas in which given weather and land-use conditions can produce a wide 285 range of responses, i.e. early D1D are likely but so are late ones. In those areas, the nearest trap measurements are poorly informative of aphid arrival, and therefore farmers cannot rely on them for timely reaction. Setting a trap locally would address this issue, provided that such high aleatory uncertainty areas are properly identified by the quantile regression.
However, minimal learning improvement should be expected from setting traps in high 290 aleatory uncertainty areas. Sampling this irreducible uncertainty only addresses the immediate and local objective of timely declaration of infestation. Of course, unpredictable infestations only constitute a risk if susceptible crops are locally present. Hence, to account for the local value of a new trap, we started with the distribution of the main crops in the UK (from UKCEH, 2018, Land Cover R plus: Crops maps, see Figure 2). We then asked aphid experts to devise 295 a link matrix l i,j between aphid species and the risk they pose to those crops (see Table 1 in Appendix C). Setting a new trap addressing the practical objective can be done by identifying locations where the function 300 is greatest; here U A x,i is the aleatory uncertainty of species i at location x; C x,j the proportion of land unit x occupied by crop j; and l i,j is the susceptibility of crop j to species i. The measure of U A used here is the 90% prediction interval of the DFF (with same unit): q 0.95 − q 0.05 .

305
For most aphid species, the epistemic uncertainty of the D1D increases with altitude (see Figure   3). Those areas, in addition to being historically deprived of measurements, are environmentally distinct from most trap locations, mainly because they are colder which likely delays first flights until later in the season. They represent a sharply defined region of the feature space in which predictions are merely extrapolations of distant measurements. It is therefore unsurprising to 310 see that the model predictions are the most uncertain at altitude.
However, setting a trap in the Highlands might seem irrelevant to agricultural production, as in addition to there being no crops to protect there, aphids are not a problem in such environments. It is nonetheless where the best generalisation improvement would occur, maximising the reduction of the overall uncertainty at the next training iteration (i.e. after next season 315 catches). However, once we exclude from the problem regions of limited agronomic interest, new areas of high uncertainty appear. Figure 5-A shows that Cornwall, Northern Ireland or Dyfed county in Wales are places of high agricultural interest that are not only geographically remote from the current monitoring network, but also environmentally different from it. In other words, new trap locations are not only selected by the spatial imbalance of the current 320 trap locations, but by the sampling imbalance within the whole feature space.  4.2 Practical objective: protecting crops Figure 4 shows the distribution of aleatory uncertainty of the D1D for each species. Note that it is distributed differently to the epistemic uncertainty ( Figure 3). From the information in Figures 4 and 2, and according to Equation 3, optimal locations for a new trap can be derived 325 (see Figure 5-B). Several high value areas are highlighted, notably along the river Tweed on the English-Scottish border, in Cambridgeshire, or also in Yorkshire where as shown in Figure   5-B a trap used to be operational. 14

Discussion
Through a logical process of estimating aphid phenology relative to meteorological and land-330 scape drivers, our models show that critical areas of agricultural importance are lacking the trap infrastructure needed to estimate first flight and other metrics. If populated, these new trap locations will improve decision support in UK agriculture, thereby reducing risk of deleterious feeding by aphids and virus transmission by aphids to crops.
The network can be improved in two ways, for improved prediction of aphid arrival (the 335 epistemic objective), or for reduced risk of missing crop infestations (the practical objective).
The former is a UK-wide improvement, while the latter is a local improvement that depends on the presence of crops needing protection. For both, our approach proposes multiple locations for setting a new trap in, providing the decision maker with the degrees of freedom needed in the process of augmenting the current infrastructure. To address this, we have relied on ANNs 340 flexibility.
Although they are mostly known for classification problems, ANNs have long proven their ability for non-linear and multivariate regressions (see e.g. Comrie, 1997). For the specific MNO problem addressed here, the basic feed-forward ANN needed to be augmented in 3 ways: (1) become a multi-output regression to account for the 14 species of interest, (2) become a quantile 345 regression so that aleatory uncertainty could be examined, as well as (3)  target is typically exclusively the epistemic uncertainty (Nguyen et al., 2019). However, many monitoring networks, like the present one, are also destined to detect and prevent a risk, which is better approached by aleatory uncertainty. We showed that those two uncertainties distribute 355 differently, resulting in diverging propositions of location for setting a new trap. Proceeding further is then a subjective choice, balancing risk management and a quest for knowledge.
Uncertainty sampling normally proceeds iteratively, presenting an oracle (e.g. a human operator) with an uncertain sample to label, which is subsequently added to the training dataset, enabling deeper learning at the next training iteration. The MNO problem presented here is a 360 single iteration of this active learning process, in which the oracle's answer results from setting a new trap for the next season measurement. From season to season, as a slow-paced active learning process unfolds, the learner (i.e. our regression model) improves.
The network improvement discussed here is based on the simplest lever: adding a new trap. However, we could address the MNO from a more comprehensive perspective, including 365 addition, deletion and relocation of one or more traps. Finding the optimal scenario would then require accounting for costs, both of setting a new trap and of loosing a given crop to aphids. Because of the scales at stake, such an experiment should be done in silico, e.g.
training a process-based model of the aphid population dynamics, and optimising an iterative improvement process. This would help in appreciating the potential improvement offered by a 370 single trap addition.
In practice, many factors other than reducing uncertainty or risk affect decisions about network improvement. For example, we proposed here that the epistemic objective should account for the presence of agricultural areas, blurring already the distinction with the practical risk-based objective. It is also important to mention that sensors have perception ranges within 375 which a new sensor is merely redundant (a radius estimated to 80km for the suction traps, Benton et al., 2002). Accounting for such an exclusion radius will mechanically result in more space-filling designs. Additionally, the domestic and practical considerations mentioned in the Introduction, such as having staff able to regularly visit traps and identify catches, are also constitutive of such a decision process. Nonetheless, our approach is not scale, location 380 or infrastructure dependent and could be used to improve most ecological monitoring and surveillance activities across the world that have specific network objectives, such as detection of occupancy, phenology and abundance.
As training ANN requires large data sets, our 881 samples appear a limitation of this study.
However, every sample includes 14 response measurements and, as they are used to train mostly 385 shared parameters, the small data set effect is attenuated. Nonetheless, it is clear that simpler machine learning methods would provide equivalent learning to this exemplary case. However, only ANNs provide all the flexibility needed here, with custom loss functions, multi-outputs and bagging. This makes ANNs great assets for uncertainty estimation even with small data sets, provided that overfitting is properly controlled.

390
An immediate extension to this work is therefore its application to rest of Europe (as 17 did Harrington et al., 2007, also with suction traps). The incremented dataset would satisfy the inherent data-hungriness of ANNs, while also extending the input space to new climates.
In the UK, this would result in improved predictions, and in different epistemic uncertainty distributions as a lot of predictions would no longer result from extrapolation.

Reproducibility
The ANN training and predictions can be reproduced from the publicly available online dataset (Bourhis et al., 2020) and code.

Appendices A Model architecture, selection and residuals
Our artificial neural network architecture comprises two dichotomies. Firstly, there is a shared 570 part of the network and species-specific parts. The former is detailed in the upper half of Figure   6, while the bottom part explains the latter. Secondly, there is a trained part of the network and an untrained one (in grey in Figure 6).
The data is composed of 881 samples for which we have 63 features (the environmental covariates), noted x 1 to x 63 , and 14 labels (the D1D of the 14 species of interest), noted y 1 575 to y 14 . The 14 labels could be output as a 1-dimensional array, but as the different quantiles are already output this way, the 14 labels require multi-output. We therefore have 14 outputs, each one of length 3 for the 3 quantiles of interest: 5%, 50% and 95%. Various activation functions are used here. The output activation function is linear as is common for regression. The species specific layers are activated with tanh functions, while 580 the main shared network has ReLU, rectified linear units, activation. The untrained shared network is activated with elu, exponential linear units, functions (Michelucci, 2018).
The network deepness (i.e. number of successive layers), the layers dimensions (i.e. number of units per layer), as well as as well as, subsequently, the ratio of shared/species-specific number of parameters are hyper-parameters requiring tuning. Tuning, or more formally model 585 selection, was done through cross-validation, reserving 20% of the samples, the testing dataset, and using the 80% remaining for training. The split is done according to a semi-deterministic method referred to as systemic stratified (Wu et al., 2013), shown to best distribute the samples when applied to small data sets and low label variance (Zheng et al., 2018), which is our case here. This procedure of random splitting and subsequent training/testing is iterated 10 times,     Following model selection, training is done within a bagging procedure. Figure 9 shows the 605 bagging residuals (i.e. averaged outputs) for every aphid species as predicted by the selected model. For the sake of illustration, here the dataset is also split into testing and training datasets. However, the whole dataset is used for training once the objective is to map uncertainty across the UK (Figures 3 and 4).