## Abstract

Rainfall induced landslides and soil erosion are part of a complex system of multiple interacting processes, and both are capable of significantly affecting sediment budgets. These sediment mass movements also have the potential to significantly impact on a broad network of ecosystems health, functionality and the services they provide. To support the integrated assessment of these processes it is necessary to develop reliable modelling architectures. This paper proposes a semi-quantitative integrated methodology for a robust assessment of soil erosion rates in data poor regions affected by landslide activity. It combines heuristic, empirical and probabilistic approaches. This proposed methodology is based on the geospatial semantic array programming paradigm and has been implemented on a catchment scale methodology using GIS spatial analysis tools and GNU Octave. The integrated data-transformation model relies on a modular architecture, where the information flow among modules is constrained by semantic checks. In order to improve computational reproducibility, the geospatial data transformations implemented in ESRI ArcGis are made available in the free software GRASS GIS. The proposed modelling architecture is flexible enough for future transdisciplinary scenario-analysis to be more easily designed. In particular, the architecture might contribute as a novel component to simplify future integrated analyses of the potential impact of wildfires or vegetation types and distributions, on sediment transport from water induced landslides and erosion.

## 1 Introduction

Hillslope processes can be envisaged as a cascade where surface erosion and mass movements are visible expressions of critical instabilities in a complex system of interacting processes that control the downslope movement of material [1] in [2]. Field observations, modelling simulations and experimental studies have shown that soil erosion can vary considerably due to the changes in soil properties, vegetation cover and topography occurring after a landslide (e.g. [3, 4, 5]). Following landslide events the changes in soil erosion rates can be strong enough to deliver significant cascading impacts on ecosystems, for example due to an increased sediment yield to a stream network. This may potentially be of ecological and economical relevance not only locally (possibly driving complex changes even at the landscape-scale [6, 7]) but also off-site, whenever ecosystem services are important for service benefit areas connected through service connecting areas [8] (e.g. stream networks).

As natural resources are intrinsically entangled in complex networks there is a growing awareness of the importance of these cascades. This, in turn is driving the development of integrated risk assessment and multi-purpose use optimization of different resources to develop appropriate management policies that can reliably model the potential influence of climate change on these process cascades, and assess the resultant economic and societal consequences.

Landslide events will result in changes in topography and vegetation cover which in turn will alter surface erosion rates and sediment yields. There are a number of relevant models that use an integrated approach to soil erosion and landslide processes, including SHETRAN [9], TOPOG [10, 11], PSIAC [12] or SIBERIA [13]. WEPP-SLIP (Water Erosion Prediction project Shallow Landslide Integrated Prediction) [3] is a model that explicitly considers post-failure sediment yield. This model integrates the physical basis of the WEPP model [14], with the infinite slope stability model of Skempton and DeLory [15]. WEPP-SLIP is able to consider the post-failure changes in soil erosion rate through the changes in topography and land cover.

Physically based models use a dynamic hydrological approach and local terrain characteristics for estimating spatial and temporal landslide probability [16]. The main limits of physically based models are that they are often optimised for small catchments and local conditions, and that these require in depth knowledge of local soil and climatological parameters [17]. Empirical methods are mainly based on the estimation of thresholds related to precipitation patterns which result in landslide occurrence [16]. This approach generally requires high temporal resolution rainfall data, which is not often available, and does not necessarily model the right processes. In addition it is limited to being applicable to only the same conditions under which it was developed [18, 17]. However, there is still room to improve the modelling of the interactions of these processes, for example through assessments of the changes in surface area made more susceptible to soil erosion following landslide events.

To quantify the potential changes in soil erosion due to landslide occurrence it is necessary to know where and when on the slope a landslide initiates and how it evolves. This paper aims to present a new modelling approach for data-poor regions in an attempt to improve the estimation of sediment budgets derived from rainfall induced landsliding and soil erosion. A statistical approach is proposed that is based on incorporating the frequency-area landslide distribution model of Malamud et al. [19] within the framework of a spatially distributed empirical soil erosion model.

## 2 the study area

The study area (Fig.1) is situated in southern Italy in the Daunia Appennines of the Puglia region, within the municipal territory of Rocchetta Sant’Antonio. It covers an area of almost 10 *km*^{2}. This area is highly susceptible to landslide activity [20, 21] with a consequent negative impact on the local economy [22]. The area neighbouring to the north-west of the Rocchetta Sant’Antonio territory presents a landslide frequency exceeding 20% for the overall area [23, 24, 22, 25]. Soil erosion is also widespread and the severity is largely determined by the combination of tillage practices and the high erodibility of the clay-rich flysch units from which some of the local soils are derived [26]. Within the catchment it is possible to distinguish four major classes of land use (agricultural soils, woodland, pastures and grassland) and three dominant lithologies (limestone, sandstone and clay). Slope angles are on average approximately 10 degrees with peak slope angles rarely exceeding 25 to 30 degrees. An ephemeral drainage network is fed by precipitation during the autumn-winter period when some 600 to 750 mm of rainfall is common [22]. The area is characterized by a Mediterranean sub-humid climate.

## 3 A new architecture for coupling of the effects of rainfall-induced shallow landslides and soil erosion

### 3.1 geospatial semantic array programming

Array programming is an approach for simplifying complex algorithm prototyping with an accurate and compact mathematical description. It originates as a means for reducing the gap between mathematical notation and its implementation within the model’s algorithms in a formalised and reproducible way. As stated by Iverson [27]: “the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation”. Array programming has been used for building the architecture for our modelling approach. For mitigating the complexity of trans-disciplinary modelling and the inconsistencies between input data, parameters and output, semantic checks on the processed information and a modularisation of the key parts of the model were introduced following the semantic array programming paradigm (SemAP) [28, 29, 30]. The proposed architecture (Fig. 2) exploits the geospatial capacities of Geographic Information Systems (GIS) in order to estimate soil erosion yield (e-RUSLE model). In our approach we integrated SemAP and geospatial tools (ArcGis and GRASS GIS) through the Geospatial Semantic Array Programming paradigm (GeoSemAP). GeoSemAP exploits geospatial tools and Semantic Array Programming for splitting a complex data-transformation-model D-TM) into logical blocks whose reliability can more easily be checked by applying geospatial and mathematical constraints.

Semantic checks are exemplified in the following paragraphs with the notation **::constraint::**. The semantic constraints were implemented within the code with a specialised module [31] of the Mastrave modelling library. A hyperlink to the corresponding online description is provided.

### 3.2 applied techniques

The pre-and post-failure soil loss rate was calculated by applying the low data demanding model e-RUSLE [32]. This model retains all the equations of its predecessor (RUSLE, [33]) and implements an extra factor to account for the effects of soil stoniness on soil erosion. Due to the flexibility of the modelling architecture that e-RUSLE is based on, it is possible to calibrate the model for application at different scales [32]. e-RUSLE was implemented using the ArcGIS software to first estimate the **::nonnegative::**^{1} **::matrix::**^{2} representing the soil erosion rates within the catchment without considering the influence of mass movement. The scripts applied for calculating the soil erosion losses can also be easily carried out using an Open Source Free Software such as GRASS GIS or Quantum GIS.

To determine the slope length factor required in e-RUSLE, the D-infinity (D∞) algorithm of Tarboton [34] was first used to calculate the flow direction and then the flow length. Due to the geomorphological characteristics of the study area, a multiple-neighbour flow algorithm was required with the D algorithm being one of the most suitable [35, 36, 37]. In GRASS GIS it is possible to apply a multiple-flow approach using the tool ‘r.watershed’ [38]. The slope steepness factor was also slightly modified in comparison to the application of the e-RUSLE presented in Bosco et al. [32]. This was based on the Nearing’s [39] equation which performs best for higher slopes [40, 32]. However the Moore and Burch [41] formula is more appropriate for slopes lower than 12.73 degrees because it gives the correct limiting value of zero in absence of any steepness. A comparison of both formulas is presented in Fig. 3, where a close matching trend is observed between 0 and 12.73 degrees (or 0 - 0.22 rad). Consequently a merged formula can be obtained by using the Moore and Burch equation for slopes less than 12.73 degrees and then the Nearing formula for higher slopes. To calculate the slope steepness factor of the model, the tool r.slope.aspect [42] of GRASS can be used. The majority of the equations that e-RUSLE is based up have been applied using the ArcGis tool ’Map Algebra’ that in GRASS corresponds to ‘r.mapcalc’ [43].

For quantifying the effect of size, position and number of landslides affecting this catchment the frequency-size distribution model proposed by Malamud et al. [19] was adopted. They found that landslide data from three quite different locations around the world (Italy, Guatemala and the USA) could be described quite well with the inverse gamma distribution

In (1), *p*= probability density (*km*^{−2}), Γ is the gamma function, *A*_{L}= the landslide area (*km*^{2}), *ρ*(-) is a parameter which controls the power law decay for medium and large landslide areas, *a* (*km*^{2}) determines the position of the maximum in the probability distribution and *s*(*km*^{2}) is a parameter which fits the exponential decay behaviour for small landslide areas. Parameter values of *ρ*= 1.4, *a*= 1.28 10^{−3} *km*^{2} and *s*= -1.32 10^{−4} *km*^{2} were shown to provide a good fit to the measured data. A dataset of more than 400 reported landslides that affected the catchment in 2006 was made available and published by Dr Janusz Wasowski of CNR-IRPI, Bari [22, 25]. For obtaining the landslide inventory, high resolution IKONOS satellite imagery was used. To make the interpretation easier, the satellite images were orthorectified and pansharpened. This dataset is not freely available but the IFFI database [44] is a valuable alternative to apply our modelling approach whenever enough data are available.

Overall a reasonable correlation between the inverse-gamma distribution of Malamud et al. [19] with the above parameter values and the frequency-size distribution of the landslide database was found (Fig. 4). The fit is very good for landslide areas greater than or equal to the peak in the distribution. For smaller landslide areas to the left of the peak the agreement is not as good, though modifications to parameters a and s could be made to improve this section. However the distribution of Malamud et al. [19] and parameter values they used, were shown to work over a wide range of landslide sizes from various countries around the world. It was found that these same parameter values also provided a similar fit to the data from our field site suggesting the possibility of universality in the parameter values and therefore removing the need for calibrating the distribution for local applications. On this basis we wanted to see how well this would perform against data from the Rocchetta catchment and kept the original Malamud parameter values. The data for the smaller landslides does have a greater degree of uncertainty as its collection could easily have led to either an over or underestimation of the landslide number. This could occur through either medium landslides being classified as smaller due to being covered by larger landslides, or though the smaller landslides being covered by larger ones and therefore missed completely. The main point of this exercise wasn’t to match exactly the landslide-area probability distribution, but to have a physically realistic distribution on which to base our modelling. To predict when and where a landslide will occur is one of the main challenges for calculating post-failure soil loss in data-poor regions. We exploited the correlation between the measured data and Malamud’s distribution through combination with Monte Carlo simulation to analyse the effects of mass movements on soil erosion by water.

Assuming the validity of the proposed inverse-gamma function for calculating the probability distribution of landslide areas we implemented a simple script (based on SemAP) in the MATLAB language. Starting from a **::scalar positive::**^{3} number to represent the number of landslides that occurred in the catchment, we then calculate the number of landslides δ*δN*_{L}(*h*) in the h-th class of landslides. Each class is a **::categorical-interval::**^{4} which includes all the landslides with an area from *A*_{L}(*h*) to *A*_{L}(*h* + 1). The classes thus form a partition of **::contiguous interval::**^{5} *s* in [0, *A*_{L}(*hmax*)] whose values are found from:

In order to evaluate the effect of the post-failure changes on the soil erosion rates in the catchment, we applied the Monte Carlo method twice. Once to randomly determine the location of a landslide and a second time to sample the Malamud distribution to assign its size. The Monte Carlo simulation was also implemented in the MATLAB language following the SemAP paradigm and exploiting the potentiality offered by the Mastrave Library [29] whose tools were largely used within the code.

To be more explicit: considering Y as a random variable distributed according to a given probability distribution, it is possible to generate n pseudo-random instances *Y*_{1},..., *Y _{n}* with the same distribution. This may be accomplished with a classical Monte Carlo extraction. Let us define

*f*(⋅) as a certain function of Y which is implemented, within the SemAP paradigm, as a D-TM transforming an instance of Y into the desired output data. Suppose we are interested in computing the integral A of

*f*(⋅) over a given domain. This implies considering the probability density function

*π*(⋅) of Y over:

Numerically, it is possible to approximately estimate *A* by exploiting the *n* Monte Carlo instances *Y*_{1},..., *Y*_{n} as
where *Y*_{run} is the *run*-th instance of *Y*corresponding to the *run*-th Monte Carlo iteration. From the law of large numbers, if *n* ⇒ ∞, . In our particular application, is the average over n runs of simulated landslides; in each of them the total erosion by water *f*(⋅) is computed for the particular array of landslides *Y*_{run}. The n arrays of simulated landslides are the basis for *f*(⋅) to estimate the corresponding post-landslide soil erosion. Each landslide occurring in the *run*-th simulation has an area distributed according to This defines *π*(⋅) as the probability density function with which each *run*-th array of landslides is distributed.

The Monte Carlo simulation was iterated 1000 times. For each of the iterations the post-failure changes in soil erosion were calculated and compared with the pre-failure estimates.

The **::matrix::**^{6} representing the cover management factor of the e-RUSLE model was calculated using a 5x5 metres resolution land cover map of the study site, produced by CNR-IRPI of Bari using ASTER satellite multi-spectral imagery and published in [22]. The map is not freely available but the CLC [45] is a valid open access alternative. The post-failure changes in vegetation cover were used within the model for estimating the effect of mass movement on soil erosion. Because of the modular modelling architecture (Fig. 2), the module that calculates the pre-failure C factor can be used as a link among our model and other approaches for measuring different land disturbance effects, in order to measure their effects on soil erosion.

The post-failure vegetation cover results were only partially altered by the slow mass movements that characterize this catchment (see fig. 1). As locally the slide surface may also remain unchanged, we introduced into the model a value representing the post-failure percentage of bare soil. By analysing the landslide dataset, the available pictures, satellite images and accounting for all the information collected during a field survey carried out within the study area, the percentage of the post-failure bare soil cover was estimated to be not less than 20% of the landslide area. For each of the pixels of the modelled landslides in each of the 1000 Monte Carlo iterations, the **::scalar positive::**^{7} **::proportion::**^{8} of bare soil was therefore randomly determined in the range 0.2 - 1.

## 4 Results and discussion

Table 1 shows the results of the Monte Carlo simulations. We replaced the mean values obtained by applying equation 4, with the median, because it is more stable in that it is only marginally affected by extreme values. By analysing the median on 1000 simulations of the cumulated pre-failure and post-failure soil erosion, an increase of 20% of the total soil loss was estimated. The post-failure soil erosion rate in areas where landslides occurred is, on average, around 3.5 times the pre-failure value.

A bootstrap analysis, based on 10,000 runs, was performed for assessing uncertainty. The analysis of the changes in the rate of soil erosion due to landslide occurrence shows post-failure increases in soil loss of approximately 1700 tons per year (bootstrap p ≤ 0.05). This corresponds to an increase of around 22% of the total soil erosion. We also analysed the extension of the area affected by slope instability. The bootstrap analysis shows that in each simulation at least 76 hectares, corresponding to around 8.5% of the catchment, is affected by landslide activity (bootstrap p ≤ 0.05). By comparing this value with the area that presented slope instability in 2006 (around 55 hectares), the applied methodology seems to slightly overestimate. The graph in figure 3 shows that Malamud’s distribution seems to underestimate the number of small landslides (< 300 *m*^{2}). Nevertheless, the probability density distribution for the Rocchetta landslides from 2006 is in line with those reported by Malamud et al. [19] for precipitation triggered landslides that took place in Guatemala in 1998. The model is in its early developmental phase and fine tuning the fit of the Malamud distribution to small landslides should help to improve the model predictions. However, for better evaluating the limits or the robustness of the proposed inverse-gamma distribution or of a modified version, further data would be necessary. The bootstrap analysis, with 10000 runs, performed on the measured data (Fig. 4) shows the uncertainty associated with a single year landslide dataset is too high for extrapolating different parameter values. A more detailed analysis based on datasets covering a longer time interval would help in improving the applied methodology. An additional source of error contributing to the predictions that needs further investigation, arises from the selection of the model for estimating soil erosion and its running with limited data, thus there is considerable scope for errors in prediction to be strongly linked to this simplification.

Because the capacity to estimate the changes in soil erosion from landslide activity is largely dependent on the quality of the available datasets, the applied methodology broadens the possibility of a quantitative assessment of these effects in data-poor regions. The obtained results, even considering a possible overestimation, confirm the important role of mass movements on soil erosion and the consequent necessity to better integrate these processes into soil erosion modelling.

## 5 Conclusions

A new method for empirically estimating the importance and extent of landslides on soil erosion losses in data-poor regions has been developed. This has been achieved by sampling the frequency-size landslide distribution proposed by Malamud et al. [19], and stochastically distributing the landslide location across the catchment. Given the increasing threat of soil erosion all over the world and the implications this has on future food security and soil and water quality, an in-depth understanding of the rate and extent of soil erosion processes is crucial.

Each year, on average, between 8.5 and 10% of the catchment shows evidence of landslide activity that is responsible for a mean increase in the total soil erosion rate between 22 and 26% over the pre-failure estimate. These results confirm the potential importance of integrating the landslide contribution into soil erosion modelling. While this approach clearly has limitations the proposed approach can be seen as a first attempt to assess the landslide-erosion interaction in areas with limited data.

The proposed modelling approach is also suitable to be applied in applications having a wider spatial extent and to be potentially implemented in a transdisciplinary context. For example, the relevant effect of wildfires on soil erosion and landslide susceptibility [46, 47] could be modelled with a higher reliability integrating the proposed approach. As stated in de Rigo et al. [47], wildfires can considerably increase soil erosion by water and landslide susceptibility. The changes in landslide susceptibility may in turn affect soil erosion. In general, considering the modelling architecture (Fig. 2), if the module that calculates the pre-failure C factor value would provide the layer altered by a different disturbance (e.g. wildfires or pests outbreak), the presented modelling architecture could be applied for estimating the indirect effect of these disturbances on soil erosion, provided a new landslide susceptibility map, that considers the altered vegetation cover, is produced.

Although the preliminary results are promising, further research is required before this method can be applied by the scientific community and relevant authorities with any level of confidence. Consideration of, and integrating within the model, post-failure changes in topography and soil characteristics (e.g. soil armouring [48]) is fundamental for increasing the predictive capacity of the model. Also a better estimation of the bare soil exposed within a landslide is fundamental for improving our model. It would also be worthwhile to improve the fit of the Malamud distribution to the data that, at the present, it is not possible due to the limited availability of measured data. For obtaining more reliable results, and more robust estimates of the effects of landslides on soil and vegetation cover, it will be also necessary to focus attention on producing a less uncertain zonation of the spatial probability of the landslide susceptibility in areas characterized by low data availability [49].

## Authors Bio

Claudio Bosco graduated in 2002, at the University of Milan, in natural sciences. His more recent research activities are focused on natural hazards and their link with climate change, combining research into quantitative, robust modelling approaches with expert-driven understanding of environmental processes. His research interest also cover: quantitative geomorphology, spatial analysis (GIS based) and wildfire effects on soil degradation processes.

Graham Sander is a Professor of Hydrology in the School of Civil and Building Engineering at Loughborough University in the UK. His research interests cover sediment transport and soil erosion modelling, shallow overland flow, unsaturated subsurface water and contaminant transport.

## Acknowledgements

We would like to thank Dr. Tom Dijkstra for his valuable comments on the manuscript. We also would like to thank Dr. Janusz Wasowski and Dr. Caterina Lamanna for providing the landslide data and Dr. Wasowski also for his fundamental support during fieldwork. This paper is published with the support of the Maieutike Research Initiative.

## Footnotes

Manuscript accepted for publication in IEEE Earthzine The definitive version might differ from this pre-print

*IEEE Earthzine*2014 Vol. 7 Issue 2, (not yet published) 2quarter theme. Geospatial Semantic Array Programming^{nd}↵

^{1}http://mastrave.org/doc/mtv_m/check_is#SAP_nonnegative.↵

^{3}http://mastrave.org/doc/mtv_m/check_is#SAP_scalar_positive/.↵

^{4}http://mastrave.org/doc/mtv_m/check_is#SAP_categorical-interval.↵

^{5}http://mastrave.org/doc/mtv_m/check_is#SAP_contiguous_interval.↵

^{7}http://mastrave.org/doc/mtv_m/check_is#SAP_scalar_positive.↵

^{8}http://mastrave.org/doc/mtv_m/check_is#SAP_proportion.