Trends and drivers of hypoxic thickness and volume in the Northern Gulf of Mexico: 1985-2018

Hypoxia is a major environmental issue plaguing the commercially and ecologically important coastal waters of the Northern Gulf of Mexico. Several modeling studies have explored this phenomenon, but primarily focus on the areal extent of the mid-summer hypoxic zone. Research into the variability and drivers of hypoxic volume and thickness is also important in evaluating the seasonal progression of hypoxia and its impact on coastal resources. In this study, we compile data from multiple monitoring programs and develop a geospatial model capable of estimating hypoxic thickness and volume across the summer season. We adopt a space-time geostatistical framework and introduce a rank-based inverse normal transformation to simulate more realistic distributions of hypoxic layer thickness. Our findings indicate that, on average, there is a seasonal lag in peak hypoxic volume and thickness compared to hypoxic area. We assess long-term trends in different hypoxia metrics (area, thickness, and volume), and while most metrics did not exhibit significant trends, mid-summer hypoxic thickness is found to have increased at a rate of 5.9 cm/year (p<0.05) over the past three decades. In addition, spring nitrogen load is found to be the major driver of all hypoxia metrics, when considered along with other riverine inputs and meteorological factors in multiple regression models. Hypoxic volume, which was also often influenced by east-west wind velocities, was found to be more predictable than hypoxic thickness.


Introduction
Increases in anthropogenic nutrient loading have led to a rise in the number of coastal hypoxic zones (a.k.a.dead zones) across the world (Smith, 2003;Fennel & Testa, 2019), threatening coastal ecosystems and fisheries (Roman et al., 2019;Rose et al., 2019).One of the largest coastal hypoxic zones forms each summer on the Louisiana-Texas shelf of the northern Gulf of Mexico.The primary driver of this dead zone is nitrogen loadings from Mississippi River Basin (Goolsby, 1999), which stimulates organic production that eventually sinks and results in bottom-water oxygen depletion through microbial decomposition (Rabalais et al., 1999).Physical conditions, such as water column stratification, are also a prerequisite for hypoxia formation (Stow et al., 2005;Feng et al., 2014), and climate change is expected to exacerbate hypoxia over time (Lehrter et al., 2017;Del Giudice et al., 2020).Observations of Gulf hypoxia date back as early as 1970, but regular monitoring started in the 1980s (Rabalais et al., 2001).The Mississippi River/Gulf of Mexico Watershed Nutrient Task Force was formed to implement measures to control nutrient loading and reduce the size of hypoxic zone to 5000 km 2 by the end of 2015, which was later extended to 2035, as the original objective remains unrealized (Hypoxia Task Force, 2017).
Over the last several decades, several organizations have conducted monitoring cruises to assess the extent of hypoxia and its ecological effects.Data from monitoring programs are used in developing models capable of estimating hypoxic area (Obenour et al., 2013;Matli et al., 2018), which are in turn used in calibrating forecast models (Scavia et al., 2017;Del Giudice et al., 2020;Li et al., 2023) and assessing fisheries-related impacts (Smith et al., 2017).However, hypoxic volume and thickness are also potentially important from an ecological and fisheries perspective (Scavia et al., 2019).Pelagic and demersal fish are affected by hypoxic volume, as they must move laterally or vertically to avoid the hypoxic layer (Hazen et al., 2009;Zhang et al., 2009).Therefore, estimates of hypoxic volume, in conjunction with hypoxic area, are needed to characterize the severity of hypoxia and help inform management efforts.
Hypoxic volume is more commonly considered in other aquatic systems, such as Chesapeake Bay (Murphy et al., 2010;Bever et al., 2013) and the Baltic Sea (Meier et al., 2011;Lehmann et al., 2014).Hypoxic volume in Chesapeake Bay has been estimated using universal kriging and other spatial interpolation methods (Chehata et al., 2007;Murphy et al., 2010;Zhou et al., 2014).In the Baltic Sea, which is approximately 30 times bigger than Chesapeake Bay, coarser spatial and temporal sampling (Wulff & Ulanowicz, 1989;Conley et al., 2011) has limited the applicability of geostatistical methods (Krapf et al., 2022).Additionally, most geostatistical studies are limited to interpolation through kriging.Kriging provides a smoothed interpolation surface, while conditional realizations (i.e., spatial Monte Carlo simulations) are needed to reliably characterize overall measures of hypoxia (e.g., area and volume) and quantify uncertainties (Chiles and Delfiner, 2009;Zhou et al., 2014).
Hypoxic volume in the Gulf of Mexico remains relatively uncharacterized.Obenour et al., (2013) developed a geostatistical model for estimating hypoxic volume based on dissolved oxygen (DO) measurements from Louisiana University Marine Consortium (LUMCON) midsummer shelfwide cruises, and Scavia et al., (2013) incorporated these estimates into a parsimonious hypoxia forecasting model.However, these models are limited to predicting hypoxic volume at the time of the LUMCON cruise, typically in late July.Additionally, Obenour et al., (2013) modeled hypoxic fraction (HF, i.e., thickness of hypoxic bottom layer divided by total water column depth) as a Gaussian random field, which is typical of geostatistical modeling, but does not fully address the highly skewed and bounded nature of actual hypoxic thickness observations.Fennel et al., (2016) developed a hydrodynamic-biogeochemical model capable of predicting hypoxic volume across the summer, but results have only been systematically compared to midsummer estimates (Obenour et al., 2013).Thus, there remains the need for data-driven estimates of hypoxic volume across the summer season, as intra-seasonal variability is important for assessing fisheries and ecosystem health (Scavia et al., 2019;Matli et al., 2020).
This study advances previous hypoxic volume estimation techniques (e.g., Obenour et al., 2013) in two substantial ways.First, it applies a space-time geostatistical framework (Matli et al., 2018) to observations of hypoxic layer thickness from multiple monitoring organizations to provide estimates of hypoxic volume across the summer season.Second, it investigates a rankbased inverse-normal transformation to address the skewed and bounded data distribution of HF.
Through conditional realizations, the model is used to probabilistically estimate the volume and thickness of hypoxia across the Gulf hypoxia region from May through September .
We also explore trends in area, volume, and thickness of hypoxia.This includes the development of regression models to compare the predictability and key drivers of the different hypoxic metrics.
The total dataset comprises 7939 sampling events collected from 186 monitoring cruises.While Matli et al., (2020) focused only on bottom-water dissolved oxygen (BWDO), this study requires the entire vertical DO profile to determine the bottom-layer hypoxic thickness, and events that do not span the entire water column are filtered out (4.5% of sampling events).Hypoxia is primarily a bottom-water phenomenon, and observations with HF>0.75 (only nine events) may result from measurement error or ephemeral biological and hydrodynamic conditions that result in high levels of near-surface respiration (Rabalais, 1999;Hull et al., 2008).Thus, we cap these "outlier" observations at 0.75.The overall dataset includes 2823 observations of HF greater than zero (histograms provided in SI-2).Our study area is bounded by 94.605-89.512°W longitude, 28.219-29.717°N latitude, and 3-75 m depths, and divided into a 5x5 km grid for estimation purposes.
The DO profiles were collected using handheld or rosette-mounted samplers (Obenour et al., 2013).In general, rosette-mounted samples did not sample the sea floor to avoid damaging the sampling rig.A correction factor was applied to account for this bias, similar to Obenour et al., (2013), as described in SI-3.

Model Formulation
Geostatistical models perform optimally when the response variable can be represented as a stationary Gaussian process (Chiles and Delfiner, 1999;Gnieting et al., 2006).Obenour et al., (2013) proposed modeling HF, rather than thickness, to partially address this issue (hypoxic thickness tends to increase with depth).However, HF values remain right skewed (SI-2) and so in this study we transform HF using a rank-based inverse-normal transformation (INT) via Blom's function (Blom, 1958).Rank-based INT converts a continuous variable (with any underlying distribution) to a normal distribution by applying a quantile function to observations ranked in ascending order.To efficiently back-transform to the original scale, we apply a piecewise quadratic regression.The coefficients and breakpoints of this regression are estimated by minimizing the mean square error when fit to observations at equal intervals along the transformed scale (Fig. 1).A space-time geostatistical model can be conceptualized as follows using transformed HF (HF t ) as the response variable (Eq (1)): where μ is the deterministic component (trend in mean) and η and ε are stochastic components.
Here, μ is a linear model with trend variables (covariates) and variable intercepts (categorical variable), with the latter accounting for interannual variability.Covariates include linear or quadratic trends with northing (N), easting (E), depth (D), day-of-year (T), and BWDO.Variable selection using a geostatistical adaptation of the Bayesian Information Criterion (BIC) is performed to avoid over-parameterization (Obenour et al., 2012).The full system of linear equations for the space-time geostatistical model are provided in Matli et al., (2018).
The variability around the deterministic component is resolved into correlated and uncorrelated stochasticity.Uncorrelated stochasticity (ε) represents environmental microvariability or sampling error.Correlated stochasticity (η) accounts for transient spatial and temporal patterns in the data.
The covariance Q between sampling events i and j, separated by distance s ij and time lag t ij , is modeled using a non-separable, space-time covariance function with exponential form (Eq (2)) (Matli et al., 2018): ,      > 0 Eq (2) where σ ε 2 represents the correlated variance (partial sill) and σ η 2 represents uncorrelated variance (nugget) in HF t .Parameters a (km) and b (days) correspond to approximately one third of the spatial and temporal correlation ranges, respectively.Note that north-south distances are scaled by an anisotropy ratio, α (see Results).Covariance function parameters are estimated using restricted maximum likelihood (Gelfand et al., 2010) to account for the selected trend variables, as outlined in Matli et al., (2018).
The fitted covariance function is used to generate unconditional realizations of HF t , which are then conditioned to the observed data and covariate trends (Chiles and Delfiner, 1999;Matli et al., 2018).These conditional realizations (CRs) account for uncertainty in the covariate trends as well as the stochastic components of the model.CRs of BWDO (Matli et al., 2018) are also required, as BWDO is one of the deterministic covariates for HF t .Moreover, simulations of hypoxic thickness are only appropriate at locations that experience hypoxia (BWDO < 2 mg/L).Thus, we adopt the two-step procedure developed by Obenour et al., (2013), where for each CR of BWDO, we first identify the locations that are hypoxic and then perform a CR of HF t over these locations.
This procedure is repeated 1000 times, and the simulations of HF t are back-transformed to HF and then multiplied by depth to get hypoxic thickness.Next, thickness values are aggregated across the estimation grid to determine hypoxic volume.Results from the 1000 CRs are used to calculate the mean and 95% confidence intervals of these estimates.This process is repeated at a 3-day interval from May to September for 1985-2019.Estimates of hypoxic area, volume, and thickness are highlighted for the LUMCON midsummer cruises (MC) and as summer averages (SA, June-August).To calculate SA hypoxia metrics, we used simulations of area, volume, and thickness for all estimation dates between June and August to determine the mean.For 95% confidence intervals of SA estimates, we summed variances using the properties of correlated random variables (Kottegoda and Rosso, 2008), where correlations among 3-day interval estimates for each year were calculated used lagged correlation coefficients.The geostatistical model was developed in MATLAB, and R was used for processing and visualizing model results (R core team, 2022; The Mathworks Inc., 2022).

Model verification
The INT is not commonly used in geostatistical modeling and has never (to our knowledge) been used in hypoxia modeling.To investigate the efficacy of this transformation, we compare its ability to produce realistic simulations of hypoxia on four different quadrants of the study area, divided by the 20-m isobath into shallow and deep regions and by the outlet of Atchafalaya River into east and west sections.For each quadrant, we compare the distribution of simulated hypoxic thickness (with and without the INT) to that of the observations.

Regression analysis of hypoxia metrics
The  , 2022).For MC models, "spring" refers to April-June and "summer" refers to July, as the LUMCON MC typically takes place in late July.For SA models, "spring" generously refers to April-July and "summer" refers to June-August.For both MC and SA, winter refers to November-March.Finally, wind stress (represented by wind speed squared) was determined as a 14-day weighted average (weights linearly declining backward in time from the cruise mid date) when predicting MC hypoxia (Obenour et al., 2015), or as a June-August average when predicting SA hypoxia.We also tested for interactions between westerlies and nitrogen loads at spring and summer seasonal scales.
We developed regressions for hypoxia metrics aggregated across the entire Louisiana-Texas shelf study area, and across east and west shelf sections, divided at the Atchafalaya River outfall.For east and west sections, we used inputs from the Mississippi and Atchafalaya Rivers, respectively, and for shelfwide metrics we used their summation.Conventional BIC was used in an exhaustive search for variable selection to prevent overfitting (Faraway, 2004).We used mean hypoxia estimates in the variable selection process.However, we used 100 samples of the hypoxia metrics (i.e., from the CRs) to determine the average variance explained (R 2 ) in each case.

Geospatial trends and stochasticity
The BIC-selected trend variables (Table 1) and annual intercepts (Table S4a) explain 30.1% of the spatio-temporal variance in transformed HF.The trend with northing indicates that for every 100 km further north, HF (back-transformed) decreases by 0.08 (relative to a baseline, average HF of 0.21).The quadratic trend with depth indicates that HF is at a maximum at 14 m of depth (all else being equal), and the quadratic trend with BWDO indicates that HF is greatest when BWDO is zero and gradually declines as BWDO increases (Figure S4a).The combined trends indicate that the highest HFs are typically at mid-depth regions on the eastern shelf (Figure S5b).Annual intercepts account for additional year-to-year variability in HF t (beyond the effect of BWDO) and have a range of 1.51 HF t from the lowest year (1988) to the highest year ( 2016), indicating a range of about 0.26 for untransformed HF.
Trends in the BWDO model also influence patterns in hypoxic layer thickness.BWDO trends are the same as those described in Matli et al., (2018), though the coefficients (Table 1) changed slightly (less than 10%) due to the inclusion of new data for 2017-2019.Notably, the BWDO trend model includes a quadratic, intra-annual temporal trend, indicating that hypoxia typically peaks in mid-July.There is also a negative trend with easting indicating BWDO decreases closer to Mississippi river outflow (Figure S5b).
The stochastic components of the geostatistical model explain variability not captured by the deterministic trends.About half (50.4%) of this remaining variability is spatially correlated within ranges of 69 and 43 km along the east-west and north-south axes (accounting for anisotropy) and temporally correlated within a lag 12 days.The other half of the stochasticity (49.6%) can be attributed to sampling error and environmental microvariability (nugget).This is a relatively large nugget compared to the BWDO model (SI-4) and suggests that estimates of hypoxic volume will be more uncertain than area.

Conditional realizations and hypoxia metrics
The mean of simulated hypoxic thickness values (from CRs) can be mapped across the study area.
Hypoxia in 2008 was relatively severe, especially in July (Figure 2, left panels), when compared to the long-term average (Figure 2, right panels).Also shown are the regions where there is at least a 50% probability of hypoxia occurring.Across our multi-decadal study period, approximately 12% of estimation locations have a greater than 50% chance of experiencing hypoxia in July and August, compared to only 8% in June.The most consistent and severe hypoxia is on the eastern shelf, closer to the Mississippi River outfall.Throughout the study period, hypoxic area and volume exhibited similar intra-seasonal patterns (Fig. 3).Based on results for years with at least two shelfwide cruises, hypoxic area reached its annual estimated peak in May, June, July, August, and September in 0, 3, 15, 6, and 0 years, respectively.For volume, the corresponding results are 0, 2, 11, 10, and 1 years, respectively, indicating that peak volume tends to lag peak area, at least on average.Of course, the results for any given date have substantial uncertainty (see 95% intervals, Fig 3), but taken together, these results suggest both patterns as well as aberrations in the seasonal progression of hypoxic area and volume.
Previous process-based modeling research has also estimated the seasonal progression of hypoxia in specific years (e.g., 2005 and2006;Fennel et al., 2016), which can be compared to our results.
In general, process-based simulations suggest more small-scale temporal variability than our estimates.In the geostatistical model, temporal microvariability is reflected in the wide 95% estimation intervals.Simulations of volume from the Regional Ocean Modelling System (ROMS) and Finite Volume Community Ocean Model (FVCOM) indicate that hypoxic volume peaked in August and September of 2005 and 2006, respectively.For 2005, the estimates of peak volume from FVCOM were similar to our estimates (within 10%), but predictions from ROMS were approximately 40% higher, though within our 95% intervals.For August and September 2006, predictions from both FVCOM and ROMS were at least 100% higher than our estimates and outside of our 95% intervals.Importantly, our geostatistical estimates are informed by monitoring cruises conducted in August and September of 2006 (Fig. 3), reducing their uncertainty.For another process-based model, the Navy Coastal Ocean Model (NCOM), peak hypoxic volume simulations in 2005 (Fennel et al., 2016) were at least 50% lower than our estimates and outside our 95% intervals.NCOM simulations for 2006, however, compare well to estimates from our model with a similar peak volume and temporal evolution.Considering these examples, there is often substantial disagreement between hypoxic volume estimates from different process-based models, and our geostatistical estimates provide an important line of evidence that could help guide future improvements to hydrodynamic-biogeochemical modeling.

Model verification
Simulations (i.e., CRs) of hypoxic thickness can be compared to the original observations to help ensure that they are realistic (Fig 4).At the same time, we do not expect simulations to perfectly mirror the observational distributions, as many of the monitoring cruises are biased toward areas and times of intense hypoxia (e.g., LUMCON mid-summer cruises specifically target the hypoxic zone) or specific regions of the shelf (Louisiana Department of Wildlife and Fisheries targets near-shore, eastern regions).In general, 20-55% of observations tend to be hypoxic depending on the region of the shelf (see bar charts of HT>0 in Fig 4).The intensity of hypoxia decreases to the west and further offshore, away from the influence of the Mississippi and Atchafalaya River outfalls, as expected based on previous studies (Rabalais et al., 2007;Fennel et al., 2013).CRs tend to have a somewhat lower percent occurrence of HT>0 (~15-35%,

Comparison of hypoxic area, volume, and thickness
To help characterize interannual variability, we present the June-August summer average (SA) of hypoxic area, volume, and thickness (Fig. 5).We also report these hypoxia metrics at the time of the LUMCON midsummer cruise (MC) for consistency with previous studies (e.g., Obenour et al., 2013).For 2016 and 2019, it is important to note that data were only available from SEAMAP, which conducts monitoring in late June and early July over limited portions of the study area, often before hypoxia peaks.Geostatistical estimates of SA hypoxia for these years have excessive uncertainty and are unusually large (Fig. 5), indicating that the geostatistical model is unable to constrain estimation uncertainties without the benefit of more detailed (e.g., LUMCON) cruises collected closer to the period of peak hypoxia (late July and August).Thus, 2016 and 2019 are omitted from the following correlation and trend analyses.
Our MC estimates can be compared to those of Obenour et al., (2013), which did not include the INT.For the common period , correlations (r 2 ) of our new volume and thickness estimates with the previous MC estimates are 0.85 and 0.76, respectively.The hypoxic area correlation is much higher (0.96) since the INT only affects thickness and volume estimation.
On average, our new MC estimates of volume are 11% higher than the results from Obenour et al., (2013), but differences in estimates from individual years range from -21% to 81%.The highest change in volume (81%) was for 2000, which was a year of unusually mild hypoxia, so this increase translated to an absolute change of only 12 km 3 .The years most likely to have maximum and minimum MC hypoxic volumes (2008 and 1988, respectively) remain unchanged.However, the confidence intervals in volume estimates from the current model are 36% wider compared to the MC estimates from Obenour et al., (2013), comparable to a 30% increase in uncertainty in area estimates from Matli et al., (2018).The model structure used in Obenour et al., (2013) did not include the temporal aspects of the observations collected in a given cruise, which is similar to assuming that all observations in a cruise are collected at the same time.Our space-time model considers shelfwide cruises more realistically as multiday events.Because not all samples are collected on the estimation date (the middle of the cruise period), temporal stochasticity adds to the estimation uncertainty.
Additionally, we compare correlations among the various hypoxia metrics generated in this study (Fig 5).Area and volume are most correlated with each other, with r 2 values of 0.83 and 0.90 for MC and SA estimates, respectively.For thickness and volume, correlations drop to 0.32 and 0.46 for MC and SA, respectively.Finally, hypoxic area and thickness were least correlated, with r 2 values of 0.06 and 0.23 for MC and SA estimates, respectively.The correlations among SA estimates are consistently higher than the correlations between MC, potentially because MC estimates reflect short-term meteorological variability that influences the structure of hypoxia, while SA estimates tend to average out these fluctuations.Overall, these results suggest that most of the variability in hypoxic volume can be attributed to changes in hypoxic area.At the same time, variability in hypoxic thickness, which is only modestly correlated to area, is also important.
MC and SA estimates were also analyzed for significant long-term temporal trends.To account for estimation uncertainty, we took 1000 samples from the distributions of the hypoxia estimates (CRs).The trends from 1000 samples and the associated level of significance (p-value from two-sided test) are averaged to determine the overall significance level of trends in MC hypoxia metrics.Consistent with results from Matli et al., (2018), there is no significant temporal trend in estimates of hypoxic area or volume (p-val>0.1).However, MC thickness estimates appear to be increasing at a rate of 5.9 cm/yr (p-val<0.05),while SA thickness is increasing at a rate of 4.4 cm/yr (p-val<0.05).These results are generally consistent with Obenour et al., (2013), who found a significant trend in hypoxic layer thickness (6.9 cm/yr) modeling only MC data through 2011.
In addition to LUMCON MC estimates, there are estimates from other shelfwide monitoring programs in 26 of the 33 years considered.The estimate associated with one (or more) of these other cruises was higher than the MC estimate in 11, 18, and 20 years for area, volume, and thickness, respectively (Fig. 5, red dots).In these years, the MC estimate was exceeded, on average, by 46%, 52%, and 25% for area, volume, and thickness, respectively.This, along with the substantial intra-seasonal variability in hypoxia (e.g., Fig. 3) demonstrate that MC estimates are not always representative of hypoxic severity for a given year.

Drivers of hypoxic area, volume, and thickness
Much of the research on Gulf hypoxia focuses on how riverine inputs influence hypoxic area, as this is the metric that traditionally drives watershed management policies (Scavia et al., 2017).However, hypoxic volume and thickness also play an important ecological role and remain relatively unexplored in the Gulf (Breitburg, 2002;Scavia et al., 2019).Here, we provide a comparison of the predictability of hypoxic area, volume, and thickness based on various loading and meteorological inputs that have been considered in previous modeling studies (e.g., Del Giudice et al., 2020).
Results show that inter-annual variability in hypoxia metrics  can be explained to varying degrees across different shelf sections, with R 2 values ranging from 0.07-0.56(Table 2).In general, volume estimates are less predictable than area estimates (26% lower R 2 , on average), and thickness estimates are less predictable than both area (65% lower R 2 , on average) and volume estimates.Across all three hypoxia metrics, MC estimates are more predictable than SA estimates. Al selected predictor variables are significant at a 95% confidence level (p-val<0.05).Consistent with most research conducted in the Gulf region (Scavia et al., 2004;Rabalais et al., 2007), spring nitrogen load from the Mississippi River Basin is found to be the primary driver of hypoxic area, volume, and thickness.Winter nitrogen load is also found to be a significant predictor of SA hypoxic area estimates, consistent with Del Giudice et al., ( 2020), but it is not selected in models for volume and thickness.Summer flows, which are expected to increase the density gradient and slow the reaeration of bottom waters, were selected as significant predictors in just two of the models.Year is selected in two models of hypoxic thickness, confirming a positive temporal trend that is independent of nutrient loading.
Wind velocities are also expected to influence the formation, propagation, and persistence of hypoxia.The associations between east-west wind velocities (westerlies are positive) and hypoxia metrics remained consistent within the different shelf sections.Wind variables are mostly selected in models for MC estimates but appear to have limited relevance for predicting SA estimates, perhaps because meteorological variability tends to be averaged out over longer time periods.Wind shear (weighted wind speed squared over two weeks preceding the mid date of the MC) was also occasionally selected, likely because it disrupts stratification, allowing for reaeration of the water column (Turner et al., 2008).
Summer winds have varied influence depending on the section being considered.For hypoxic area and volume, summer westerlies promote hypoxia on the eastern section, but tend to reduce it on the western section.This is consistent with previous research (Feng et al., 2014;Laurent et al., 2018) showing that upwelling (westerly) winds in June-July lead to a higher likelihood of hypoxia on the eastern shelf but reduce hypoxia on the western shelf.For thickness, however, summer winds are only predictive of shelfwide estimates (not east or west section estimates), indicating they lead to thicker hypoxic layers.This may be a result of a rise in the pycnocline along the coast in response to upwelling circulation patterns (O'Donnell, 2010).
In the spring, easterly wind patterns (i.e., negative east-west winds) prevail on the Gulf shelf.Strong spring easterlies are associated with increased MC hypoxia estimates on the western shelf (and also the entire shelf, for hypoxic area).Additionally, an interaction between spring nitrogen loads and spring winds is selected for MC hypoxic volume on the eastern section.Overall, this interaction indicates that increasing easterly winds reduce hypoxia on the eastern shelf, opposite to the effect for the western shelf (Fig 6).The interaction also suggests that nutrient loading becomes more influential on the eastern shelf when easterlies are weak (Fig 6,varying slopes).Weak easterlies likely result in an increased availability of nutrients in the delta region between Mississippi-Atchafalaya River outlets (Walker et al., 2005).Other studies conducted in the region found similar effects of east-west winds on formation of hypoxia (Feng et al., 2014;Obenour et al., 2015;Zhang et al., 2012).

Limitations
The geostatistical model developed in the current study uses spatio-temporal data collected at irregular cadence and limited spatial coverage to provide continuous estimates of hypoxic volume and thickness across the summer in the Gulf of Mexico region.While these estimates provide new insights to the propagation and distribution of hypoxia in the region, the model has certain limitations.The accuracy of geostatistical models is highly dependent on the quality and availability of data at regular intervals.Data availability at irregular cadence limits the models ability to constraint uncertainty in estimated values during early and late summer months with minimal monitoring efforts.Additionally, the model structure makes it difficult to capture the influences of changes in hydrodynamic-meteorological factors on the estimated values.Future work in improving this model can focus on incorporating covariates that capture short term changes in system conditions, like the fusion approach employed in Matli et al., 2020.Furthermore, the regression analysis on hypoxia metrics is also preliminary with limited scope and could benefit from a more rigorous study considering additional riverine and meteorological inputs.

Conclusions
Through the current study, we have developed a geostatistical model capable of estimating highlighting the need for reliable monitoring mid-to-late summer, when hypoxia is typically most severe.At the same time, estimates for years with multiple cruises are more constrained and suggest the need for improvements to hydrodynamic-biogeochemical models, like ROMS, FVCOM, and NCOM, which may not always provide realistic simulations, especially toward the end of the summer season.Overall, the intra-seasonal variability and uncertainty in hypoxia estimates reinforces the need for frequent hypoxia monitoring (Rabalais et al., 2007).
Long-term trend analyses corroborate the results from Matli et al., (2018) and Obenour et al., (2018), indicating no significant trends in hypoxic area or volume since 1985, but suggest an increase in hypoxic layer thickness.This temporal trend also appeared in some multiple linear regressions for hypoxic thickness, indicating it exists independent of seasonal nutrient loading.
Assessing the cause of this increasing trend is beyond the scope of this study, but will hopefully stimulate further research, as it may be related to changes in physical properties (i.e., changes in pycnocline depth), or changes in the nature of oxygen demands over time (Turner et al., 2008).
Multiple linear regression analysis based on common hypoxia predictor variables, found spring nitrogen loads from the Mississippi River Basin to be the primary driver of hypoxia across all shelf regions and metrics (area, volume, thickness).However, the analysis shows that volume and (especially) thickness are harder to predict than hypoxic area.These results, along with low correlations between hypoxic area and thickness over time, emphasize the need for considering multiple metrics in hypoxia management.Most research in the Gulf region has focused on evaluating the effects of changing BWDO conditions and hypoxic area on fish and shrimp catch rates.The availability of spatially and temporally resolved estimates of hypoxic thickness from this study can further enhance our understanding of hypoxia effects on aquatic life in the Gulf.

Fig 2 .
Fig 2. Mean of simulated bottom water hypoxic thickness on 15 June, 15 July, and 15 August in Fig 4) than observations, consistent with expectations, since the model is not biased toward times and locations of intense hypoxia.The distribution of HT varies substantially across different portions of the shelf (Fig 4, SI-7).In the shallow regions (<20 m), most observations and simulations of HT are less than 5 m, while in the deeper regions, around half of HT values exceed 5 m.Comparisons between east and west suggest fewer differences, although the east section appears to have more frequent occurrences of hypoxic thicknesses exceeding 5 m.While all results show substantial right-skew, the CRs in the deep regions exhibit more variance than the observations, while in the shallow regions they exhibit less variance.For comparison, we also present results based on a geostatistical model developed without the INT of HF.Without this transformation, many HF values are simulated as negative (and treated as zero) even though BWDO is simulated as hypoxic (below 2 mg/L).This is apparent in the substantial reduction in percent HT>0 for the model without the INT (dashed lines in Fig 4 bar plots).With the transformation, simulations of zero HT in hypoxic waters are avoided, and the occurrence rate of HT>0 is more realistic.Moreover, omitting the INT produces distributions of hypoxic thickness with a negative bias and too little variance, particularly in shallower shelf regions (Fig 4, dashed curves).On the other hand, in the deeper sections, the INT tends to result in distributions with a positive bias and more variance, such that there is not an improvement in the distributions of hypoxic thickness for deeper locations.Overall, the model with INT provides more realistic simulations of HF and thickness relative to the standard Gaussian geostatistical model.Without the INT, the model appears to under-simulate occurrences of HT>0 across the entire study area, and it has a negative bias in the shallow shelf sections, where hypoxia is most common.Over the entire shelf, the model without INT under-simulates occurrences of HT>0 by 61% relative to the observations, and it undersimulates mean HT (across the entire study area) by 61% relative to observations (0.57 m vs. 1.48 m).At the same time, the model with INT simulates HT>0 at a rate of 30% across June-August, and results in an overall average hypoxic thickness of 1.03 m, more comparable to the observations (SI-7).

Fig 4 .
Fig 4. Probability density distributions of observed hypoxic thickness and geostatistical

Fig 5 .
Fig 5. Estimates of hypoxic volume, thickness, and area for summer average conditions (SA),

Fig 6 .
Fig 6.Influence of interaction between spring nitrogen (N) load and spring east-west wind velocity thickness and volume of coastal hypoxia across the summer based on the limited observations of various monitoring programs.Transforming HF values to a more Gaussian distribution (i.e., using INT) within the geostatistical model resulted in more realistic simulations of hypoxic layer thickness across our study area, based on comparisons with observational distributions.However, years with no midsummer cruises (2016, 2019) result in excessive estimation uncertainty, MC and SA estimates are used as response variables in multiple linear regressions with available hydro-meteorological drivers.For SA, we only use estimates from 1992-2015, which is a continuous period that includes multiple shelfwide cruises in each year, consistent with Del Giudice et al., 2020.Candidate predictor variables were motivated by previous studies (e.g., Del Giudice et al., 2020) and include aggregations of nutrient loading over winter and spring seasons (Casey, 2021), river flows in summer (USACE, 2022), east-west wind velocities over spring and summer seasons, and wind stress around the time of prediction (NOAA

Table 1 .
Trend coefficients (β) and associated standard errors (σ β ) of BIC selected variables for Note: Units of trend coefficients (β) of HF t model are 1/unit of covariate, and BWDO model are mg/L per unit of covariate.

Table 2 :
Regression fit (R 2 ) and coefficients for BIC-selected variables for models of hypoxic area (A), volume (V), and average thickness (T) across the entire shelf (i.e., shelfwide, SW), and east (E) and west (W) of the Atchafalaya River outfall.Models were developed for both midsummer LUMCON cruise (MC) and summer-average (SA) estimates.Asterisks denote an interaction between spring nitrogen (N) loads and spring westerly wind velocity for E MC volume (with a coefficient of 0.18).Model intercepts provided in SI-9.