Post-Hurricane Vital Statistics Expose Fragility of Puerto Rico’s Health System

Importance Hurricane Maria made landfall in Puerto Rico on September 20, 2017. As recently as May of this year (2018), the official death count was 64. After a study describing a household survey reported a much higher death count estimate, as well as evidence of population displacement, extensive loss of services, and a prolonged death rate the government released death registry data. These newly released data will permit a better understanding of the effects of this hurricane. Objective Provide a detailed description of the effects on mortality of Hurricane Maria and compare to other hurricanes. Design We fit a statistical model to mortality data that accounts for seasonal and non-hurricane related yearly effects. We then estimated the deviation from the expected death rate as a function of time. Setting We fit this model to 1985-2018 Puerto Rico daily data, which includes the dates of hurricanes Hugo, Georges, and Maria, 2015-2018 Florida daily data, which includes the dates of Hurricane Irma, 2002-2004 Louisiana monthly data, which includes the date of Hurricane Katrina, and 2000-2016 New Jersey monthly data, which includes the date of Hurricane Sandy. Results We find a prolonged increase in death rate after Maria and Katrina, lasting at least 207 and 125 days, resulting in excess deaths estimates of 3,400 (95% CI, 3,100-3,700), and 1,800 (95% CI, 1,600-2100) respectively, showing that Maria had a more long term damaging impact. Surprisingly, we also find that in 1998, Georges had a comparable impact to Katrina’s with a prolonged increase of 106 days resulting in 1,400 (95% CI, 1,200-1,700) excess deaths. For Hurricane Maria, we find sharp increases in a small number of causes of deaths, including diseases of the circulatory, endocrine and respiratory system, as well as bacterial infections and suicides. Conclusion and Relevance Our analysis suggests that since at least 1998, Puerto Rico’s health system has been in a precarious state. Without a substantial intervention, it appears that if hit with another strong hurricane, Puerto Ricans will suffer the unnecessary death of hundreds of its citizens. Key Points Question: How does the effect of Hurricane Maria on mortality in Puerto Rico compare to the effect of other hurricanes in Puerto Rico and other United States jurisdictions? Findings: We estimate about 3,000 excess deaths after Maria, a higher toll than Katrina. Only other comparable effect was after Georges, also in Puerto Rico. For Georges and Maria, we observe a prolonged death rate increase of more than 10% lasting several months. The causes of death that increased after Maria are consistent with a collapsed health system Meaning: Puerto Rico’s health system does not appear to be ready to withstand another strong hurricane.


Introduction
Hurricane Maria made landfall in Puerto Rico on September 20, 2017, interrupting the water supply, electricity, telecommunications networks, and access to medical care. In early May 2018, the official death count stood at 64 1 . This figure was in conflict with estimates obtained by groups that ostensibly had access to death counts for September and October from the demographic registry of Puerto Rico. By comparing these two numbers to historical averages, additional deaths were estimated to be in excess of 1,000 [2][3][4][5] . However, as of May 2017, the government of Puerto Rico was not releasing the 2017-2018 data.
On May 29, 2018 a multi-institutional study was published 6 , here referred to as the Harvard Study, describing a survey of 3,299 households and reporting a death count estimate of 4,645 (95% CI, 498). It also reported an extensive loss of services after the hurricane. Perhaps most importantly, the study showed evidence of a sustained increased effect on mortality throughout this extended period. These findings underscored the importance of a careful analysis to determine if, for example, there was a systematic increase in deaths due to indirect effects, if a specific demographic was at greater risk, and what type of medical conditions needed most attention.
The Harvard study received worldwide media coverage and three days after its publication, while under significant public pressure and facing a lawsuit 7 , the government finally made the data public and acknowledged the possibility of a higher death count of 1,427 8 . This number is consistent with the value one obtains by subtracting the 2017 monthly counts from the 2013-2016 average using the newly released data (Supplementary Table 1): (2928-2399)+(3040-used these data to construct the daily counts for the 2015-2018 period. Exploratory data analysis showed that data were incomplete after May 31, 2018 (Supplementary Figure 1) and discarded data past this date. Yearly population estimates for the island were obtained from the Puerto Rico Statistical Institute. We computed daily population estimates via linear interpolation (Supplementary Figure 2A). To obtain an estimate of the population displacement after Hurricane Maria, we used population movement estimates from Teralytics Inc. 14 (Supplementary Figure 2B).
We combined these two datasets to obtain a final estimate of the population (Supplementary Figure 2C).

Hurricane Irma (Florida):
We requested daily death counts from Florida's Vital Statistic System and obtained data from 2015 to 2018. For consistency, we discarded data past May 31, 2018. Yearly population estimates were obtained from the US Census for 2015-2017. We computed daily population estimates using interpolation and extrapolated using a linear model for 2018. Furthermore, we used data provided by Teralytics Inc. to estimate changes in population due to immigration from Puerto Ricans due to Hurricane Maria (Supplementary Figure 2D).

Hurricane Katrina (Louisiana):
We requested daily death counts from Louisiana's Vital Statistic System and obtained data from [2003][2004][2005][2006]. For Louisiana, we also obtained monthly death counts from the Underlying Cause of Death database through CDC WONDER for 2000-2008 15 . These two datasets did not match for the months following Hurricane Katrina (Supplementary Figure 3). Since the data for August 2005 matched, we used the daily data to divide the monthly counts for August into before, during, and after the hurricane counts. We obtained population estimates from the US Census and computed daily population estimates via linear interpolation (Supplementary Figure 2E).

Hurricane Sandy (New Jersey):
. CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint We obtained monthly death counts for New Jersey from the Underlying Cause of Death database from 2000-2016. We also obtained yearly population estimates from the US Census and interpolated to obtain monthly estimates. (Supplementary Figure 2F)

Hurricane Harvey (Texas):
We requested daily death counts from Texas' Vital Statistic System, but our petition was denied.
The Underlying Cause of Death database does not have data available for 2017.
Here ) ",$ is an offset to account for the changing population size, . " accounts for the year-to-year variability not due to hurricanes, 0(%) is the seasonal effect for the %-th day of the year, 4 ",$ = 365 * (& − 1) + % is time in days, and 3(4 ",$ ) accounts for the remaining variability not explained by the Poisson variability. So, for example, a virus epidemic will make 3(4 ",$ ) go up slowly, eradication of this epidemic will make 3(4 ",$ ) go down slowly, and a catastrophe will make 3(4 ",$ ) jump up sharply. We therefore assume 3(4 ",$ ) is a smooth function of 4 ",$ except for the days hurricanes make landfall in which the function may be discontinuous.
Since 0(%) is seasonal, we use Fourier's theorem and model it as: We include an intercept >, which represents the baseline rate for the entire period being studied.
We assume that 3(4 ",$ ) is a natural cubic spline with M equally spaced knots N B , … , N P , except that the closest knot to the hurricane day is changed to be exactly on the hurricane day and we permit a discontinuity at this knot. Since natural cubic splines can be represented as a linear combination . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint of basis function and 0(%) is a linear combination of known functions, ours is a generalized linear model (GLM) and, in theory, we can estimate . " , 0, and 3 using maximum likelihood estimates.
However, because we want 3 to be flexible enough to capture relatively high-frequency signals, we instead implement a modular approach that first estimates the . " s and 0, then uses these as offsets to estimate 3 . Specifically, we assume that for the non-hurricane years, . " + 3(4 ",$ ) average out to 0 across years and use this assumption to obtain the MLE for 0(%), denoted with s R(%). We then use s R(%) as an offset to estimate the year-to-year deviations . " using only months not affected by hurricanes (March-August). We then use the estimate α T " + s R(%) as an offset, estimate 3(4 ",$ ) with the MLE and use standard GLM theory to estimate standard errors. Finally, due to lack of data to estimate . " for 2018, we extrapolated from the estimates from the previous year (Supplementary Figure 4).
For the seasonal effect we use K=3 since it results in an estimate that captures the general shape

Excess death estimates
We first determined the period of indirect effect for each hurricane. To do this, we defined the interval starting on landfall day, denoted here with 4 U , until the first day, 4 ",$ > 4 U , for which 3(4 ",$ ) ≤ 0. Because we do not observe 3(4 ",$ ), and instead obtain an estimate, denoted here with 3 Y (4 ",$ ), . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint we took the conservative approach of defining the period with the day for which the lower part of a marginal 95% confidence interval crosses 0: 3 Y Z4 ",$ [ − 1.96]_`3 Y (4 ",$ )a ≤ 0. We denoted this interval with b and defined the excess deaths by adding the observed deaths minus expected deaths for every day in the interval: We construct a 95% confidence interval using the following approximation for the Poisson model:

Natural variability
We quantify the day-specific variability with the observed standard deviation across non-hurricane years for 3(4 ",$ ). Because of years like 2001 and 2014, we estimate this standard error with the medial absolute deviation. We refer to this as the natural variability.

Cause of Death
To examine if any cause of death was more prevalent after the hurricane, we used the individual records data spanning 2015-2017. We did not include 2018 data because it appears incomplete (Supplementary Figure 8). We divided causes of death into 30 categories (Supplementary Table   2) and, for each of these, we computed the observed death rate during the September 20 -December 31 period for 2017. We then compared these to the expected rates computed with the 2015-2016 data for that same period. We used the Poisson model to compute confidence intervals. To estimate a daily effect for a cause of death, we fit the Poisson GLM model described above.

Code:
All the code used to implement the procedures described above are included in a GitHub repository: (https://github.com/rafalab/maria). We conducted our analysis using R version 3.4.4.

Results
. CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint

Indirect effects
The death rate in Puerto Rico increased by 73.9% (95% CI, 63.1%-85.4%) the day after Hurricane

Cause of Death
Increases in rates were not uniformly seen across all cases of death after Hurricane Maria.
Instead, we observed increases for a subset of the causes ( Figure 3A). Not surprisingly, storm related deaths showed the largest increase. However, in terms of total excess deaths diseases . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint of the circulatory, nervous, endocrine and respiratory systems explained well over 65% of deaths until at least December 31, 2017 (Table 1, Figure 3B). The indirect effects were for these causes were substantial ( Figure 3C).

Death Rate by Age and Gender
The most affected demographic groups were, from most to least: 80 years and older, 70-79, and 60-69 and 50-59 with this last group mostly affected by indirect effects (Figure 4). Although younger demographics were not significantly affected (Supplementary Figure 10), these results demonstrate that a large proportion of the population was indeed affected by the indirect effects.
We found no significant differences between genders (Supplementary Figure 10).

Natural variability in excess death rates
The standard deviation of 3 Y (4 ",$ ) taken across non-hurricane years translated to increases of about 4%, with values as high as 6% in parts of the winter (Supplementary Figure 11A). As a result, for a period of, for example, 103 days (September 20 to December 31) these levels of variability translate into a standard deviation of excess deaths of about 300. For a period of 207 days, the standard deviation grows to about 500 (Supplementary Figure 11B). The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint a formal statistical model, we estimated close to 1,400 excess deaths due to this hurricane, an overall impact similar to that of Katrina. In contrast to the response in Louisiana, as far as we know, no systematic effort was put in place to improve Puerto Rico's electrical grid nor its fragile health system. On the contrary, it appears that the grid continued to deteriorate for the next 19 years 19 . Tragically, after Hurricane Maria made landfall the electrical grid was destroyed leaving 100% of the population 20 , including health facilities, without electricity. It has been well documented that restoration of the electrical grid has been slow 21 , with some estimates reporting that only 30% of the population had electricity a month after the tropical storm 22 . We estimate that, as a result of this fragile health system, a large proportion of the population was affected and as many as 3,000 excess deaths occurred. The insights presented in our analysis should be considered in preparation efforts for the next hurricane.

Discussion
It is important to note that our estimates and confidence intervals are for the excess deaths above the expected count and that, given the observed variation in non-hurricane years, we can't attribute all these deaths exclusively to the hurricane, nor can we say that the toll was actually much higher. The same is true for other published estimates 4,9,13 . If we include this natural variability in the uncertainty estimates, the confidence intervals would grow by as much as 1,000 on each limit. This underscores the importance of going beyond analyses based on just monthly counts and historical averages. For example, the sharp jump and then slow decrease in death rate observed on landfall day (Figure 1) provides strong evidence that the observed increases are in fact due to hurricane. Studying specific causes of the excess deaths ( Figure 3, Table 1), which do not show increases for causes such as murders, viruses, and cancer, provide further support for the hurricane being the main cause.
In our analysis, the number of knots defining our splines, the number of harmonics in the seasonal effect, and the way we extrapolate to estimate the 2018 death rate were chosen by visual We also examined how the estimate changed with different population estimates (Supplementary Figure 15). None of the changes led to a different overall conclusion. The code needed to recreate all the figures and tables is available at: https://github.com/rafalab/maria and we invite others to use our publicly available code and data to try out other approaches.

Acknowledgments
We thank María M. Juiz Gallego and Jose A. López Rodríguez from the Department of Health of Puerto Rico for diligently providing all the data we requested. We thank the Puerto Rico Institute of Statistics for providing population data. We thank Canay Deniz, Andrea Samdahl, Lara Montini and Ilya Vasilenko from Teralytics for sharing their data and providing helpful explanations. We thank Matthew Kiang for many valuable demography insights and Deepak Lamba Nieves for suggesting readings on Puerto Rico's electrical grid. We thank Raul Figueroa for critiques that led us to perform the sensitivity analysis. Finally, we thank all the authors of NEJM paper whose publication appears to have resulted in the government releasing the data. In particular, we thank     The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint

Statistical Methods: Monthly counts
For the monthly data, we fit a monthly version of the model above. Because the counts are much larger once we aggregate at this level, we made use of the normal approximation to the Poisson.
Specifically, we defined the monthly rates as ! ",$ ≡ & ",$ /( ",$ where & ",$ is the average number of deaths in month ) of year * and ( ",$ is the person years for that period. Thus, we collapsed the model above to a monthly version as follows: we assumed that the monthly rates can be described with the following model: ! ",$ = , " + . $ + / ",$ 0 ",$ + 1 ",$ . Here , " accounts for year-to-year variability as in daily data model and . $ is the average of . (3) for all days 3 in month ). Because we no longer need splines to model the effects, we instead use indicator functions to denote if a month/year was affected by the hurricane. Specifically, we define 0 ",$ as an indicator that is 1 for the months ) in year * that were affected by the hurricane and 0 otherwise. The parameter / ",$ thus represents the effect of the hurricane on death rate and is equivalent to the integral of 5(6 ",7 ) for 6 ",7 in month ) of year *. The natural, yet non-hurricane related variability, is represented by the term 1 ",$ which are assumed to be independent and normally distributed with average 0 and month-specific standard deviation 8 $ . Notice that this is a standard linear model and the estimates can be obtained with least squares.  The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint Figure 6: Model fit check. We compute the residuals after fitting the seasonal and yearly effects and plot these against time. The fitted 5(6 ",7 ) and marginal 95% confidence interval are The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint Figure 11: Natural non-hurricane related variation. For each day we estimated the standard deviation using median absolute deviation of 5(6 ",7 ). The dashed lines are the 5(6 ",7 ) and the grey ribbon is plus or minus two standard deviation The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint Death rate = 8.50 Death rate = 9.00 Death rate = 9.50 . CC-BY 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/407874 doi: bioRxiv preprint